--- title: "Area of Applicability and Extrapolation Detection" author: "Gilles Colling" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Area of Applicability and Extrapolation Detection} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, dev = "svglite", fig.ext = "svg" ) library(BORG) library(ggplot2) theme_borg <- theme_minimal(base_size = 12) + theme( panel.background = element_rect(fill = "transparent", colour = NA), plot.background = element_rect(fill = "transparent", colour = NA), legend.background = element_rect(fill = "transparent", colour = NA) ) theme_set(theme_borg) ``` # Introduction A model that performs well on the test set still tells you nothing about *where* it can be trusted. If training data comes from lowland forests and the model is asked to predict in alpine meadows, the accuracy numbers from cross-validation do not apply. The model has never seen those environmental conditions, and its predictions there are extrapolations masquerading as interpolations. This problem is pervasive in spatial ecology but not unique to it. Any prediction task where the deployment domain differs from the training domain faces the same risk: species distribution models projected to new continents, crop yield forecasts extrapolated to future climate conditions, land cover classifiers applied to satellite imagery from regions without ground truth. The cross-validation error, however carefully estimated, applies only to the conditions represented in the training data. Roberts et al. (2017) showed that random cross-validation inflates performance when data are spatially autocorrelated, because nearby points that share information end up in both training and test sets. Meyer and Pebesma (2021) addressed a complementary problem: even with spatially blocked CV, the reported accuracy applies only within the environmental space covered by training data. Blocked CV does not solve everything. They formalized the concept of an *area of applicability* (AOA), the region in feature space where a model can make reliable predictions because training data covers that region sufficiently. Their dissimilarity index (DI) measures how far each prediction point is from the nearest training observation in a scaled, weighted feature space. Points with DI above a threshold derived from the training data's own internal distances fall outside the AOA. Predictions at those points should be excluded from downstream analysis or at least flagged as unreliable. BORG implements the full Meyer and Pebesma framework, plus additional tools for extrapolation detection (univariate range checks and Mahalanobis distance), geographic transferability assessment (region-to-region performance decay), prediction uncertainty mapping, and stability analysis. Together, these functions answer the question that cross-validation metrics alone cannot: *where* does the model work, and where does it break? This vignette walks through each tool in detail, using synthetic spatial data where the ground truth is known. We start with the dissimilarity index, build up to the full AOA, then move to extrapolation detection, transferability assessment, prediction and stability maps, and finally show how to combine everything with blocked cross-validation for a complete spatial modelling pipeline. # The Dissimilarity Index ## What it measures The dissimilarity index quantifies how different a new observation is from the training data in feature space. For each prediction point, `borg_di()` computes the weighted Euclidean distance to the nearest training observation after standardizing all predictors to zero mean and unit variance. If variable importance weights are provided (from `borg_importance()`), predictors that matter more for the model receive higher weight in the distance calculation, so the DI reflects *model-relevant* dissimilarity rather than raw distance across all variables equally. The function also computes a threshold: the mean plus one standard deviation of the leave-one-out DI values within the training data. This threshold represents the upper bound of "normal" dissimilarity that the training data exhibits internally. Any new point with DI above this threshold is more dissimilar to the training data than most training points are to each other, and predictions there should be treated with caution. ## Basic usage We start by generating spatial data with known autocorrelation. ```{r simulate-data} sim <- borg_simulate( n = 400, type = "spatial", n_predictors = 5, signal_strength = 0.3, autocorrelation = 0.6, seed = 42 ) dat <- sim$data predictors <- paste0("x", 1:5) ``` The spatial simulation places observations on a grid with Gaussian spatial autocorrelation. Coordinates live in columns `x` and `y`, predictors in `x1` through `x5`, and the response in `y.1` (renamed by R because `y` is already taken by the coordinate column). Now suppose we want to predict at new locations that extend beyond the training domain. We create a prediction grid that covers a wider spatial extent. ```{r di-compute} set.seed(99) new_data <- data.frame( x = runif(300, -0.5, 1.5), y = runif(300, -0.5, 1.5) ) # Generate predictor values with some shift to simulate environmental novelty for (p in predictors) { new_data[[p]] <- rnorm(300, mean = 0.5, sd = 1.5) } di <- borg_di(train = dat, new = new_data, predictors = predictors) summary(as.numeric(di)) cat("Threshold:", attr(di, "threshold"), "\n") cat("Points above threshold:", sum(di > attr(di, "threshold")), "of", length(di), "\n") ``` The threshold is derived entirely from the training data's internal structure and does not depend on the prediction locations. It remains stable regardless of where you plan to predict. Points with DI above the threshold are in parts of feature space that the training data does not cover well. ## Variable importance weighting When predictors are weighted by their importance to the model, the DI focuses on the dimensions that actually matter for prediction. A point might be far from the training data in an irrelevant predictor but close in the predictors that drive the model, and without weighting, the DI would flag it as dissimilar even though the model would interpolate confidently. ```{r di-weighted} # Fit a model and compute importance model <- lm(y.1 ~ x1 + x2 + x3 + x4 + x5, data = dat) imp <- borg_importance(model, dat, target = "y.1", coords = c("x", "y"), predictors = predictors, n_rep = 5) # Weighted DI di_weighted <- borg_di( train = dat, new = new_data, predictors = predictors, weights = imp$importance ) cat("Unweighted threshold:", round(attr(di, "threshold"), 3), "\n") cat("Weighted threshold: ", round(attr(di_weighted, "threshold"), 3), "\n") ``` Weighting changes both the DI values and the threshold. If the model relies heavily on a few predictors, the weighted DI becomes more permissive for points that differ only in unimportant variables. Points that differ in the variables that drive the model will receive higher DI values and face a tighter threshold. # Area of Applicability ## From DI to a binary mask The dissimilarity index is a continuous measure. The area of applicability converts it into a binary decision: inside or outside. `borg_aoa()` wraps `borg_di()` and applies the threshold to produce a data frame with a logical `aoa` column. It also computes the local point density (LPD), which counts how many training observations fall within the threshold distance of each prediction point in feature space. Points inside the AOA but with low LPD sit near the boundary, and predictions there may still be less reliable than at points with high LPD. ## Full worked example ```{r aoa-compute} aoa <- borg_aoa( train = dat, new = new_data, predictors = predictors, coords = c("x", "y") ) aoa ``` The print method reports the fraction of prediction points inside the AOA, the DI threshold, and the DI range. Points outside the AOA have `aoa = FALSE` and should be excluded from accuracy reporting and, ideally, from any downstream analysis that assumes reliable predictions. ```{r aoa-map, fig.width=7, fig.height=5.5} autoplot(aoa) ``` The spatial map colours points by their dissimilarity index and marks those outside the AOA with a different symbol. In this example, points at the edges of the prediction grid, far from the training data's spatial extent, tend to fall outside the AOA because the environmental conditions there were not sampled during training. ## Cross-validated threshold The default threshold (mean + SD of training LOO-DI) is a reasonable starting point, but it ignores the specific CV strategy used for model evaluation. When you already have spatial CV folds, you can derive the threshold from the DI values of cross-validated test sets, which calibrates it to the same degree of spatial separation that the CV folds impose. ```{r aoa-cv-threshold} cv <- borg_cv(dat, coords = c("x", "y"), target = "y.1", v = 5) aoa_cv <- borg_aoa( train = dat, new = new_data, predictors = predictors, coords = c("x", "y"), folds = cv ) cat("Default threshold: ", round(attr(aoa, "threshold"), 3), "\n") cat("CV-based threshold:", round(attr(aoa_cv, "threshold"), 3), "\n") cat("AOA coverage (default): ", sprintf("%.1f%%", 100 * mean(aoa$aoa)), "\n") cat("AOA coverage (CV-based):", sprintf("%.1f%%", 100 * mean(aoa_cv$aoa)), "\n") ``` When folds are provided, the threshold is set to the 95th percentile of the cross-validated DI distribution. This is typically more conservative than the default because spatially blocked folds impose larger train-test separation than leave-one-out distances within the training data. Using the CV-based threshold aligns the AOA with the conditions under which the model's accuracy was actually estimated. ## Manual threshold You can also set the threshold manually. This is useful when you want to explore how sensitive the AOA boundary is to the threshold value, or when domain knowledge suggests a specific cutoff. ```{r aoa-manual} # More conservative: lower threshold excludes more points aoa_strict <- borg_aoa( train = dat, new = new_data, predictors = predictors, coords = c("x", "y"), threshold = attr(aoa, "threshold") * 0.8 ) # More permissive: higher threshold includes more points aoa_lenient <- borg_aoa( train = dat, new = new_data, predictors = predictors, coords = c("x", "y"), threshold = attr(aoa, "threshold") * 1.5 ) cat("Strict AOA: ", sprintf("%.1f%%", 100 * mean(aoa_strict$aoa)), "inside\n") cat("Default AOA: ", sprintf("%.1f%%", 100 * mean(aoa$aoa)), "inside\n") cat("Lenient AOA: ", sprintf("%.1f%%", 100 * mean(aoa_lenient$aoa)), "inside\n") ``` The sensitivity of the AOA boundary to the threshold tells you something about the data. If a small threshold change flips many points between inside and outside, the prediction region sits near the boundary of the training distribution. If the AOA is stable across thresholds, either the training data covers the prediction domain well or the prediction points are far enough away that no reasonable threshold would include them. # Extrapolation Detection ## Univariate and multivariate checks While the AOA uses a weighted nearest-neighbour distance, `borg_extrapolation()` takes a different approach: it checks whether each prediction point falls outside the training data's environmental envelope. The check works at two levels. The **univariate** check tests whether each predictor value at the prediction point falls outside the observed range in the training data. If temperature in the training data ranges from 5 to 25 degrees and the model is asked to predict at a location where temperature is 30 degrees, that predictor is flagged as out of range. The function counts how many predictors are out of range and lists which ones, making it easy to identify the specific environmental axes where extrapolation occurs. The **multivariate** check uses the Mahalanobis distance from each prediction point to the centroid of the training data in feature space. This distance accounts for the correlation structure among predictors, so a point can be within the univariate range of every individual variable yet still be an outlier in the joint distribution. Consider a training set where temperature and precipitation are positively correlated: hot, wet sites and cool, dry sites are well represented, but hot, dry combinations never occur. A prediction at a hot, dry location would pass all univariate range checks but would have a large Mahalanobis distance, correctly flagging it as novel. ## Worked example ```{r extrapolation-compute} ext <- borg_extrapolation( train = dat, new = new_data, predictors = predictors, coords = c("x", "y") ) ext ``` The print method summarizes how many prediction points are extrapolating and identifies the most frequently out-of-range variables. This tells you which environmental gradients your training data fails to cover. ```{r extrapolation-map, fig.width=7, fig.height=5.5} autoplot(ext) ``` The map colours points by the number of variables outside the training range and marks extrapolating points with a cross. Points where multiple variables are simultaneously out of range are the most dangerous: the model is not just stretching slightly beyond its training domain but operating in a region of feature space that it has no information about. ## Comparing extrapolation and AOA Extrapolation detection and the AOA answer related but distinct questions. Extrapolation checks whether a point falls outside the *convex hull* of the training data in each dimension separately (univariate) or jointly (Mahalanobis). The AOA checks whether a point is close enough to the *nearest training observation* in a weighted feature space. A point can be inside the convex hull of the training data yet far from any actual training point, because the interior of the hull might be sparsely sampled. In that case, the extrapolation check would say "safe" while the AOA would correctly identify it as unreliable. ```{r compare-ext-aoa} both <- data.frame( extrapolating = ext$extrapolating, outside_aoa = !aoa$aoa ) cat("Both flagged: ", sum(both$extrapolating & both$outside_aoa), "\n") cat("Extrapolating only: ", sum(both$extrapolating & !both$outside_aoa), "\n") cat("Outside AOA only: ", sum(!both$extrapolating & both$outside_aoa), "\n") cat("Neither (safe): ", sum(!both$extrapolating & !both$outside_aoa), "\n") ``` In practice, points outside the AOA are a superset of extrapolating points: they include all points beyond the training envelope *plus* points that are within the envelope but in sparsely sampled regions. Using both checks together gives the most complete picture. If a point is extrapolating, you know the model is being asked to predict conditions it has literally never seen. If a point is inside the training envelope but outside the AOA, the model has seen similar conditions somewhere in the training data, but those training points are far away, and the local density is too low for reliable interpolation. # Transferability Assessment ## Quantifying performance decay with distance Cross-validation tells you how well a model performs on held-out data from the same spatial domain as the training data, but says nothing about performance in a different geographic region. A model trained on forest plots in the Swiss Alps might work well for held-out Swiss plots but fail entirely when transferred to the Scandinavian Alps, even if both regions contain similar forest types. `borg_transferability()` measures this decay explicitly. It splits the data into geographic regions using k-means clustering on coordinates, trains a model on each region, and tests it on every other region. The result is a performance matrix showing within-region accuracy (the diagonal) and cross-region accuracy (the off-diagonal elements), plus a distance-performance decay curve showing how accuracy degrades as the geographic distance between training and prediction regions increases. ## Worked example ```{r transferability-compute} tf <- borg_transferability( data = dat, formula = y.1 ~ x1 + x2 + x3 + x4 + x5, coords = c("x", "y"), n_regions = 4, metric = "rmse" ) tf ``` The print method shows the mean within-region and cross-region performance, plus the transfer ratio (cross-region / within-region RMSE). A transfer ratio close to 1.0 means the model transfers well across regions. Values much larger than 1.0 indicate that the model loses accuracy when applied outside its training region. ```{r transferability-decay, fig.width=7, fig.height=5} autoplot(tf, type = "decay") ``` The decay plot shows error as a function of distance between region centroids. Each point is one train-region / test-region pair. Filled circles are within- region evaluations (distance = 0 or near-zero), open circles are cross-region transfers. The linear trend line gives a rough estimate of the rate at which performance degrades with distance. ```{r transferability-matrix, fig.width=6, fig.height=5} autoplot(tf, type = "matrix") ``` The heatmap shows the full performance matrix. Rows indicate which region was used for training; columns indicate which region was used for testing. The diagonal (within-region) values are typically the best. Asymmetries in the off-diagonal entries reveal directional transfer effects: training on region A and testing on region B might give different results than training on B and testing on A. These asymmetries often reflect differences in environmental coverage between regions. ## Interpreting transfer ratios The transfer ratio is the mean cross-region error divided by the mean within-region error. For RMSE: - **Ratio near 1.0**: The model generalizes well. The spatial structure is weak or the model has learned the underlying process rather than local patterns. - **Ratio 1.5 to 2.0**: Moderate transfer decay. The model captures some generalizable signal but also fits region-specific patterns. Consider whether additional predictors could explain the regional differences. - **Ratio above 2.0**: Poor transfer. Within-region accuracy is misleading as an estimate of real-world performance. Spatial blocking in CV is essential, and the AOA should be used to mask predictions in under-sampled regions. The `n_regions` parameter controls the granularity of the assessment. Fewer regions mean larger training sets and higher within-region performance, but coarser geographic resolution in the decay curve. More regions give finer spatial resolution at the cost of smaller training sets and potentially unstable within-region estimates. Four to six regions is a reasonable default. For large datasets with clear geographic gradients, eight or more can reveal finer-scale transfer patterns. The transfer assessment uses a simple model (by default `lm()`) to keep computation fast, but the `fit_fun` argument lets you substitute any model-fitting function that accepts a formula and data argument and has a `predict()` method. Pass `randomForest::randomForest` or a wrapper around any other learner to assess transferability under a more realistic model. # Prediction Maps ## Spatial uncertainty from cross-validation `borg_prediction_map()` collects the predictions that each observation receives when it appears in a test fold. Each point is tested against a model trained on a different subset of the data, so the predictions vary across folds. The standard deviation at each location provides a spatial map of prediction uncertainty, showing where predictions are stable regardless of the held-out data and where they shift depending on the training set composition. ```{r prediction-map-compute} cv_folds <- borg_cv(dat, coords = c("x", "y"), target = "y.1", v = 5) pm <- borg_prediction_map( data = dat, folds = cv_folds, formula = y.1 ~ x1 + x2 + x3 + x4 + x5, coords = c("x", "y") ) head(pm) ``` ```{r prediction-map-uncertainty, fig.width=7, fig.height=5.5} autoplot(pm, type = "uncertainty") ``` Locations with high prediction SD are points where the model's output depends heavily on which observations were included in training, usually because the local signal is weak, the point sits at the edge of the training distribution, or nearby observations have high influence. Comparing the uncertainty map with the AOA map often reveals that points near the AOA boundary have higher prediction variability. ```{r prediction-map-residual, fig.width=7, fig.height=5.5} autoplot(pm, type = "residual") ``` The residual map shows where the model systematically over- or underpredicts. Spatial structure in the residuals (clusters of positive or negative errors) suggests that the model is missing a spatially varying process, and either a spatially varying coefficient model or additional spatial predictors should be considered. # Stability Maps ## Where predictions change with the training set While `borg_prediction_map()` only produces predictions for points when they appear in a test fold, `borg_stability_map()` takes a different approach: it refits the model on each fold's training set and predicts *all* locations (not just the test set). This gives every location a prediction from every fold, enabling a finer-grained view of how sensitive the predictions are to the composition of the training data. ```{r stability-map-compute} sm <- borg_stability_map( model = model, data = dat, target = "y.1", coords = c("x", "y"), folds = cv_folds, v = 5 ) sm ``` The print method summarizes prediction variability across all locations, highlighting the most unstable ones. ```{r stability-map-plot, fig.width=7, fig.height=5.5} autoplot(sm, metric = "pred_sd") ``` High instability at a location can indicate several things: the location sits in a data-sparse region where removing a few nearby training points changes the prediction substantially; the model is overfitting to specific training observations; or the underlying relationship is non-stationary, so models trained on different spatial subsets learn different local patterns. The stability map can also display the coefficient of variation (`pred_cv`) or the prediction range (`pred_range`), each emphasizing different aspects of instability. ```{r stability-map-range, fig.width=7, fig.height=5.5} autoplot(sm, metric = "pred_range") ``` Locations with large prediction ranges are particularly concerning because the model could give very different answers depending on which data happens to be available. These are exactly the locations where confidence intervals from standard regression theory would be too narrow, because they assume a fixed model that does not change with the training data. ## Stability maps versus prediction maps Both `borg_prediction_map()` and `borg_stability_map()` produce per-location summaries of prediction variability, but they differ in an important way. The prediction map only records a prediction for each observation when that observation falls in a test fold. If 5-fold CV is used, each point gets exactly one prediction. In contrast, the stability map refits the model on each fold's training set and predicts *every* location from every fold. This means each point receives as many predictions as there are folds, giving a richer picture of variability. The prediction map is the right tool for assessing how well the CV predictions match the actual values at each location (via residuals). The stability map is the right tool for assessing how sensitive the model's output is to the training set composition. Both are informative, and together with the AOA, they provide three complementary views of model reliability: the AOA says "is there enough training data nearby in feature space?"; the prediction map says "how accurate are the predictions here?"; and the stability map says "how much would the predictions change if we retrained on different data?". # Combining AOA with Blocked Cross-Validation ## The full pipeline In a real spatial modelling project, the tools above form a pipeline: diagnose the data structure, generate appropriate CV folds, fit the model, evaluate it honestly, compute variable importance, determine where the model is applicable, and flag predictions that fall outside that region. ### Diagnose dependencies ```{r pipeline-diagnose} diag <- borg_diagnose(dat, coords = c("x", "y"), target = "y.1") cat("Dependency type:", diag@dependency_type, "\n") cat("Recommended CV: ", diag@recommended_cv, "\n") ``` BORG detects spatial autocorrelation and recommends spatial blocking. ### Generate blocked CV folds ```{r pipeline-cv} cv <- borg_cv( dat, coords = c("x", "y"), target = "y.1", v = 5 ) cat("CV strategy:", cv$strategy, "\n") cat("Folds:", length(cv$folds), "\n") ``` ### Fit and evaluate with honest folds ```{r pipeline-fit} metrics <- numeric(length(cv$folds)) for (i in seq_along(cv$folds)) { fold <- cv$folds[[i]] m <- lm(y.1 ~ x1 + x2 + x3 + x4 + x5, data = dat[fold$train, ]) preds <- predict(m, newdata = dat[fold$test, ]) metrics[i] <- sqrt(mean((dat$y.1[fold$test] - preds)^2)) } cat("Blocked CV RMSE:", round(mean(metrics), 4), "\n") ``` ### Compute spatially-aware importance ```{r pipeline-importance} full_model <- lm(y.1 ~ x1 + x2 + x3 + x4 + x5, data = dat) imp <- borg_importance( full_model, dat, target = "y.1", coords = c("x", "y"), predictors = predictors, n_rep = 5 ) imp ``` ### Determine the area of applicability ```{r pipeline-aoa} # Prediction grid covering a wider extent set.seed(123) pred_grid <- data.frame( x = runif(500, -0.3, 1.3), y = runif(500, -0.3, 1.3) ) for (p in predictors) { pred_grid[[p]] <- rnorm(500, mean = mean(dat[[p]]), sd = sd(dat[[p]]) * 1.2) } aoa_final <- borg_aoa( train = dat, new = pred_grid, predictors = predictors, coords = c("x", "y"), weights = imp$importance, folds = cv ) aoa_final ``` ### Generate predictions with AOA mask ```{r pipeline-predict} pred_grid$prediction <- predict(full_model, newdata = pred_grid) pred_grid$inside_aoa <- aoa_final$aoa pred_grid$di <- aoa_final$di # Only report predictions inside the AOA reliable <- pred_grid[pred_grid$inside_aoa, ] cat("Predictions generated:", nrow(pred_grid), "\n") cat("Reliable (inside AOA):", nrow(reliable), "\n") cat("Excluded (outside AOA):", sum(!pred_grid$inside_aoa), "\n") ``` ```{r pipeline-map, fig.width=7, fig.height=5.5} ggplot(pred_grid, aes(x = x, y = y)) + geom_point(aes(color = prediction, shape = inside_aoa), size = 1.5, alpha = 0.8) + scale_color_viridis_c(option = "viridis") + scale_shape_manual( values = c("TRUE" = 16, "FALSE" = 4), labels = c("TRUE" = "Inside AOA", "FALSE" = "Outside AOA") ) + coord_equal() + labs(title = "Predictions masked by Area of Applicability", subtitle = sprintf("%.1f%% of prediction grid inside AOA", 100 * mean(pred_grid$inside_aoa)), x = "x", y = "y", color = "Predicted\nvalue", shape = NULL) ``` The final map shows predictions coloured by their value, with crosses marking locations outside the AOA. These excluded locations are not necessarily wrong, but they carry no statistical guarantee from the cross-validation, and reporting them without qualification would misrepresent the model's evaluated performance. ### Check for extrapolation in the reliable predictions ```{r pipeline-extrapolation} ext_check <- borg_extrapolation( train = dat, new = reliable, predictors = predictors, coords = c("x", "y") ) cat("Even within AOA, extrapolating:", sum(ext_check$extrapolating), "of", nrow(reliable), "\n") ``` This final check confirms that points inside the AOA are also within the training envelope. In most cases, the AOA is the tighter constraint and catches everything the extrapolation check would flag, but checking both is good defensive practice when the AOA threshold was set manually or when importance weights heavily downweight certain predictors. # Practical Guidance ## When to use the AOA The area of applicability is most valuable when the prediction domain differs from the training domain. This is the norm in spatial ecology: models are trained on sampled locations and then projected across a study area, a region, or a continent. It also occurs in temporal forecasting (training on historical data, predicting the future) and in transfer learning (training on one study system, applying to another). Use the AOA whenever: - **The prediction grid extends beyond the sampled area.** Even a modest extension can introduce environmental conditions that were not represented in training. - **The sampling design is clustered.** Opportunistic or road-biased sampling leaves large gaps in environmental space. The AOA identifies predictions that fall into those gaps. - **The model will be used by others.** A prediction map without an AOA mask invites misuse. Users will extract values at arbitrary locations without knowing whether the model was evaluated under those conditions. - **Accuracy reporting requires geographic qualification.** Many journals now require that spatial predictions include a statement about where the model is applicable. The AOA provides that statement with a formal basis. ## When the AOA is less useful The AOA assumes that model reliability is a function of proximity in feature space. This assumption breaks when: - **The model is fundamentally wrong.** A model with the wrong functional form can fail everywhere, including inside the AOA. The AOA tells you where the model has enough data to make predictions, not whether those predictions are correct. - **Non-stationarity is the dominant issue.** If the relationship between predictors and response changes across space (different slopes in different regions), the model might perform poorly even at locations surrounded by training data. The transferability analysis and stability map are better diagnostics for non-stationarity than the AOA. - **The training data itself is biased.** If the training labels are systematically wrong in certain regions (e.g., different measurement protocols at different sites), proximity in feature space does not guarantee reliable predictions. ## Interpretation guidelines **DI values.** The raw DI number has no intrinsic meaning; it depends on the number and scale of predictors and on the importance weights. Compare DI values within a single analysis, not across different models or datasets. What matters is the relationship between the DI and the threshold: points well below the threshold are solidly inside the AOA, points near the threshold are borderline, and points far above it are in uncharted territory. **Local point density.** The `lpd` column in the `borg_aoa()` output counts training points within the threshold distance. Two points can both be inside the AOA but have very different LPD values. A point with LPD = 50 is surrounded by training data in feature space; a point with LPD = 1 is barely inside the AOA with a single training neighbour. When reporting predictions, consider using LPD as a continuous confidence measure rather than relying solely on the binary inside/outside classification. **Threshold sensitivity.** Always test how your results change with the threshold. If the scientific conclusions depend on whether the threshold is set at the default or 10% higher, the conclusions are fragile. Plot the fraction of prediction points inside the AOA as a function of the threshold multiplier: ```{r threshold-sensitivity, fig.width=7, fig.height=4} base_threshold <- attr(aoa, "threshold") multipliers <- seq(0.5, 2.0, by = 0.1) coverage <- vapply(multipliers, function(m) { mean(aoa$di <= base_threshold * m) }, numeric(1)) ggplot(data.frame(multiplier = multipliers, coverage = coverage), aes(x = multiplier, y = coverage)) + geom_line(linewidth = 0.8, color = "#2C3E50") + geom_point(size = 2, color = "#2C3E50") + geom_vline(xintercept = 1.0, linetype = "dashed", color = "#C0392B") + scale_y_continuous(labels = scales::percent_format()) + labs(title = "AOA Coverage Sensitivity", subtitle = "Fraction of prediction points inside AOA vs threshold multiplier", x = "Threshold multiplier", y = "Coverage") ``` A steep drop-off near multiplier = 1.0 means many prediction points sit right at the boundary. A gradual decline means the training data coverage tapers smoothly. The shape of this curve should inform how you communicate uncertainty: a sharp cliff warrants a binary "inside / outside" classification, while a gentle slope suggests a continuous confidence gradient is more appropriate. ## Mahalanobis distance and its limitations The Mahalanobis distance in `borg_extrapolation()` is sensitive to the covariance structure of the training data. When the covariance matrix is near-singular (because predictors are highly correlated or because the number of predictors approaches the number of training observations), the inverse covariance matrix amplifies noise. The function regularizes the covariance matrix by adding a small ridge term to the diagonal, but with many correlated predictors, the Mahalanobis distance can become unstable. In such cases, the per-variable range check (the `n_vars_outside` column) is more reliable than the Mahalanobis distance, and the AOA (which uses nearest-neighbour distances rather than covariance inversion) is a better overall diagnostic. As a rule of thumb, the Mahalanobis distance works well when the number of predictors is less than about one-third the number of training observations and the predictors are not too collinear. With high-dimensional or collinear predictors, consider reducing dimensionality (via PCA or by removing redundant variables) before running `borg_extrapolation()`, or rely on the AOA instead. ## Reporting AOA results in publications When including AOA results in a manuscript, report at minimum: 1. The number and identity of predictors used. 2. Whether variable importance weights were applied and how they were derived. 3. The threshold method (default, CV-based, or manual) and the threshold value. 4. What fraction of the prediction area falls inside the AOA. 5. How predictions outside the AOA were handled in the accuracy assessment (excluded, reported separately, or flagged). If possible, include the AOA map as a figure panel alongside the prediction map so that readers can see which parts of the prediction surface are supported by training data. This lets them judge whether the model's conclusions rest on predictions in well-covered regions or in areas where the training data is thin. ## Connecting to the broader BORG workflow The functions in this vignette sit downstream of the core BORG diagnostics. A typical analysis uses BORG in three phases: 1. **Before modelling**: `borg_diagnose()` identifies dependencies. `borg_cv()` generates valid folds. `borg_extrapolation()` checks whether the prediction domain is within the training envelope. 2. **During modelling**: `borg_workflow()` fits the model inside blocked folds, collecting honest performance metrics. `borg_importance()` computes spatially-aware variable importance. 3. **After modelling**: `borg_aoa()` determines where the model can be trusted. `borg_prediction_map()` and `borg_stability_map()` show prediction uncertainty and stability across space. `borg_transferability()` quantifies how well the model generalizes to new regions. Each tool answers a different question, but they share a common philosophy: a model evaluation is only meaningful if it accounts for the data's dependency structure and is honest about where the model has not been tested. # References Meyer, H., & Pebesma, E. (2021). Predicting into unknown space? Estimating the area of applicability of spatial prediction models. *Methods in Ecology and Evolution*, 12(9), 1620--1633. doi:10.1111/2041-210X.13650 Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., ... & Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. *Ecography*, 40(8), 913--929. doi:10.1111/ecog.02881