--- title: "Framework Integration" author: "Gilles Colling" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Framework Integration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "svglite", fig.ext = "svg", fig.width = 7, fig.height = 5 ) library(BORG) transparent_theme <- ggplot2::theme( plot.background = ggplot2::element_rect(fill = "transparent", colour = NA), panel.background = ggplot2::element_rect(fill = "transparent", colour = NA), legend.background = ggplot2::element_rect(fill = "transparent", colour = NA) ) # Check package availability has_caret <- requireNamespace("caret", quietly = TRUE) has_recipes <- requireNamespace("recipes", quietly = TRUE) has_rsample <- requireNamespace("rsample", quietly = TRUE) has_mlr3 <- requireNamespace("mlr3", quietly = TRUE) ``` ## Why This Guide Exists Most R modelling frameworks provide their own cross-validation utilities: `caret::trainControl()`, `rsample::vfold_cv()`, `mlr3::rsmp("cv")`. These work well when observations are independent. When they are not (spatially autocorrelated, temporally ordered, or clustered by group), random CV inflates performance estimates because nearby or related points leak information between the training and test folds. The inflation can be large enough to reverse conclusions about model quality. BORG addresses this problem at two levels. First, it can **inspect** existing framework objects (recipes, resampling schemes, fitted models) and report whether information has leaked. Second, it provides **guarded wrappers** that replace the standard CV constructors. These wrappers diagnose the data structure before splitting, block random CV when it would be invalid, and optionally switch to an appropriate blocking strategy. The result is a CV object that plugs directly back into the framework you are already using. This guide walks through integration with each framework, starting from base R and progressing through caret, tidymodels, mlr3, and the two main species distribution modelling (SDM) frameworks: ENMeval and biomod2. Each section includes a comparison of the standard approach versus the BORG-guarded alternative, worked examples with runnable code, and advice on when each integration point matters most. ## Quick Reference The table below maps each framework's native CV function to its BORG equivalent. "Validate" means BORG inspects an object that already exists; "Generate" means BORG creates the CV folds itself. | Framework | Native Function | BORG Validate | BORG Generate | |-----------|----------------|---------------|---------------| | Base R | manual indices | `borg()` | `borg_cv()` | | caret | `trainControl()` | `borg_inspect(preProcess)` | `borg_trainControl()` | | tidymodels | `vfold_cv()` | `borg_inspect(recipe)` | `borg_vfold_cv()` | | tidymodels | `group_vfold_cv()` | `borg_inspect(rset)` | `borg_group_vfold_cv()` | | tidymodels | `initial_split()` | `borg_inspect(rsplit)` | `borg_initial_split()` | | tidymodels | (rset output) | -- | `borg_rset()` | | mlr3 | `rsmp("cv")` | `borg_inspect(task)` | `borg_to_mlr3()` | | ENMeval | `ENMevaluate()` | -- | `borg_to_enmeval()` | | biomod2 | `BIOMOD_Modeling()` | -- | `borg_to_biomod2()` | | Any | pipeline object | `borg_pipeline()` | -- | ## Base R The simplest integration uses manual index vectors. This is also the lowest level of BORG's API, and every framework integration ultimately reduces to this pattern: a data frame, a vector of training indices, and a vector of test indices. ```{r base-r} data <- iris set.seed(42) n <- nrow(data) train_idx <- sample(n, 0.7 * n) test_idx <- setdiff(1:n, train_idx) borg(data, train_idx = train_idx, test_idx = test_idx) ``` ### Safe Preprocessing Pattern A common source of leakage in base R workflows is fitting preprocessing parameters (means, standard deviations, PCA loadings) on the full dataset instead of the training set alone. When test observations contribute to the centering or scaling, the model has indirect access to test-set information during training. The correct pattern is to compute statistics from training data and apply them to both sets: ```{r preprocessing-pattern} train_data <- data[train_idx, ] train_means <- colMeans(train_data[, 1:4]) train_sds <- apply(train_data[, 1:4], 2, sd) scaled_train <- scale(data[train_idx, 1:4], center = train_means, scale = train_sds) scaled_test <- scale(data[test_idx, 1:4], center = train_means, scale = train_sds) ``` This same principle applies to any learned transformation: imputation values, Box-Cox parameters, feature selection thresholds. If the transformation depends on `y` or on test-set statistics, the evaluation is contaminated. ## caret ### When to Use BORG with caret caret's `trainControl()` provides `method = "cv"` and `method = "repeatedcv"` for standard cross-validation, plus `method = "LGOCV"` for leave-group-out. None of these account for spatial or temporal structure unless you manually supply fold indices via the `index` argument. If your data has autocorrelation and you use `method = "cv"` without custom indices, nearby observations will appear in both train and test folds, inflating metrics. BORG's `borg_trainControl()` intercepts this by running `borg_diagnose()` on your data before building the `trainControl` object. If it detects spatial, temporal, or group dependencies, it blocks random CV and tells you to generate proper folds with `borg_cv()`. ### Validating preProcess Objects caret's `preProcess()` object stores the training-set statistics it learned. `borg_inspect()` checks whether those statistics came from the full dataset or only from the training partition. ```{r caret-preprocess, eval = has_caret} library(caret) data(mtcars) train_idx <- 1:25 test_idx <- 26:32 # BAD: preProcess on full data (leaks test-set statistics) pp_bad <- preProcess(mtcars[, -1], method = c("center", "scale")) borg_inspect(pp_bad, train_idx, test_idx, data = mtcars) # GOOD: preProcess on training data only pp_good <- preProcess(mtcars[train_idx, -1], method = c("center", "scale")) borg_inspect(pp_good, train_idx, test_idx, data = mtcars) ``` ### Guarded trainControl `borg_trainControl()` accepts the same `method` and `number` arguments as `caret::trainControl()`, plus optional `coords`, `time`, and `groups` arguments that trigger dependency checking. When dependencies are found and `allow_override = FALSE` (the default), the function stops with an actionable error message explaining the detected dependency type and estimated metric inflation. ```{r caret-traincontrol-example, eval = FALSE} sim <- borg_simulate(n = 300, type = "spatial", seed = 10) d <- sim$data # This will error because spatial autocorrelation is present ctrl <- borg_trainControl( data = d, method = "cv", number = 5, coords = c("x", "y"), target = "y.1" ) ``` The error message suggests generating blocked folds via `borg_cv()` and passing them to `trainControl(method = "cv", index = folds)`. Here is the complete corrected workflow: ```{r caret-corrected, eval = FALSE} # Generate spatially-blocked folds cv <- borg_cv(d, coords = c("x", "y"), target = "y.1", v = 5, output = "caret") # Use the BORG folds inside caret's trainControl ctrl <- caret::trainControl( method = "cv", index = cv$index, indexOut = cv$indexOut ) # Train as usual model <- caret::train(y.1 ~ x1 + x2 + x3 + x4 + x5, data = d, method = "lm", trControl = ctrl) ``` If you want caret to proceed anyway (for example, to compare random CV with blocked CV), set `allow_override = TRUE`. BORG will emit a warning but return a standard `trainControl` object. ### Pipeline-Level Validation with borg_pipeline() For fitted `caret::train` objects, `borg_pipeline()` decomposes the pipeline into stages (preprocessing, resampling, tuning, model fitting) and inspects each for leakage independently. ```{r caret-pipeline, eval = has_caret} data(mtcars) ctrl <- caret::trainControl(method = "cv", number = 5) model <- caret::train(mpg ~ ., data = mtcars[1:25, ], method = "lm", trControl = ctrl, preProcess = c("center", "scale")) result <- borg_pipeline(model, train_idx = 1:25, test_idx = 26:32, data = mtcars) result ``` The output lists each stage with its status (OK or LEAK) and issue count. The `leaking_stages` field names the stages that need attention, so you can fix them individually without re-inspecting the entire pipeline. ## tidymodels (rsample + recipes) ### When to Use BORG with tidymodels tidymodels separates preprocessing (recipes), resampling (rsample), model specification (parsnip), and tuning (tune) into distinct packages. This modularity is a strength, but it also means leakage can appear at multiple points: a recipe prepped on the full dataset, an `initial_split()` that ignores temporal order, or `vfold_cv()` folds that mix spatially adjacent observations. BORG integrates at each of these points. ### Validating Recipe Objects `borg_inspect()` accepts prepped recipe objects and checks whether the training set used for `prep()` matches the training partition. If the recipe was prepped on the full dataset, BORG flags it as a hard violation. ```{r tidymodels-recipes, eval = has_recipes && has_rsample} library(recipes) library(rsample) data(mtcars) set.seed(123) split <- initial_split(mtcars, prop = 0.8) train_idx <- split$in_id test_idx <- setdiff(seq_len(nrow(mtcars)), train_idx) # BAD: Recipe prepped on full data rec_bad <- recipe(mpg ~ ., data = mtcars) |> step_normalize(all_numeric_predictors()) |> prep() borg_inspect(rec_bad, train_idx, test_idx, data = mtcars) # GOOD: Recipe prepped on training only rec_good <- recipe(mpg ~ ., data = training(split)) |> step_normalize(all_numeric_predictors()) |> prep() borg_inspect(rec_good, train_idx, test_idx, data = mtcars) ``` ### Guarded rsample Functions BORG provides drop-in replacements for three rsample constructors. Each accepts the same core arguments as the original, plus optional `coords`, `time`, `groups`, and `target` arguments for dependency detection. **`borg_vfold_cv()`** replaces `rsample::vfold_cv()`. When no structure hints are provided, it passes through to the standard rsample function. When structure hints are present, it runs `borg_diagnose()` and takes one of three actions depending on configuration: 1. **Block** (default): stop with an error describing the dependency. 2. **Auto-block** (`auto_block = TRUE`): silently switch to blocked CV. 3. **Override** (`allow_override = TRUE`): warn but return random folds. ```{r borg-vfold-example, eval = FALSE} sim <- borg_simulate(n = 300, type = "spatial", seed = 10) d <- sim$data # Will block: spatial autocorrelation detected folds <- borg_vfold_cv(d, coords = c("x", "y"), target = "y.1", v = 5) # Auto-block: switches to spatial blocking and returns an rset folds <- borg_vfold_cv(d, coords = c("x", "y"), target = "y.1", v = 5, auto_block = TRUE) ``` **`borg_group_vfold_cv()`** replaces `rsample::group_vfold_cv()`. Group-based CV is generally appropriate for clustered data, but when spatial or temporal structure also exists, group-level folds may not be sufficient. BORG warns if additional dependencies are found beyond the grouping variable. ```{r borg-group-vfold-example, eval = FALSE} # Ecological monitoring: sites with repeated visits site_data <- data.frame( site_id = rep(1:30, each = 10), lon = rep(runif(30, -10, 10), each = 10), lat = rep(runif(30, -10, 10), each = 10), abundance = rpois(300, lambda = 5) ) folds <- borg_group_vfold_cv( data = site_data, group = "site_id", v = 5, coords = c("lon", "lat"), target = "abundance" ) ``` If the sites are spatially clustered (several nearby sites forming a regional group), BORG will warn that group CV alone may not prevent spatial leakage. In that case, switch to `borg_cv()` with `strategy = "spatial_block"`. **`borg_initial_split()`** replaces `rsample::initial_split()`. When a `time` column is specified, it sorts by time and splits chronologically, ensuring that all training observations precede all test observations. For spatial data, it warns if random splitting would cause leakage. ```{r borg-initial-split-example, eval = FALSE} sim <- borg_simulate(n = 300, type = "temporal", seed = 7) d <- sim$data # Chronological split: 80% earliest for training, 20% latest for test split <- borg_initial_split(d, prop = 0.8, time = "time") ``` ### Converting BORG Folds to rset Objects When you generate folds with `borg_cv()` and want to use them with `tune::tune_grid()` or `tune::fit_resamples()`, convert them to an `rset` object using `borg_rset()`. The returned object has class `c("borg_rset", "rset", "tbl_df", ...)` and works directly with the tidymodels tuning infrastructure. ```{r borg-rset-example, eval = has_rsample} set.seed(42) sim <- borg_simulate(n = 200, type = "spatial", seed = 42) d <- sim$data cv <- borg_cv(d, coords = c("x", "y"), target = "y.1", v = 5) rset <- borg_rset(data = d, cv_obj = cv) class(rset) nrow(rset) ``` You can also pass raw fold lists (any list of lists with `$train` and `$test` integer vectors) via the `folds` argument, which is useful if you built custom folds outside of `borg_cv()`. ### Convenience Constructors: borg_spatial_cv() and borg_temporal_cv() For the common case where you know your data is spatial or temporal and just want properly blocked folds in rsample format, BORG provides two convenience functions. These call `borg_cv()` internally with `output = "rsample"` and the appropriate strategy. ```{r convenience-constructors, eval = has_rsample} set.seed(42) sim <- borg_simulate(n = 200, type = "spatial", seed = 42) d <- sim$data # Spatial blocking, returned as rset spatial_folds <- borg_spatial_cv(d, coords = c("x", "y"), target = "y.1", v = 5) class(spatial_folds) ``` `borg_temporal_cv()` works the same way for time-series data, accepting a `time` column name and an optional `embargo` parameter that introduces a gap between training and test windows. ### Validating Existing rsample Objects If you have an rsample object from another source (for example, `rsample::rolling_origin()` or `spatialsample::spatial_block_cv()`), you can validate it with `borg_inspect()`: ```{r rsample-validation, eval = has_rsample} ts_data <- data.frame( date = seq(as.Date("2020-01-01"), by = "day", length.out = 200), value = cumsum(rnorm(200)) ) rolling <- rolling_origin( data = ts_data, initial = 100, assess = 20, cumulative = FALSE ) borg_inspect(rolling, train_idx = NULL, test_idx = NULL) ``` ## mlr3 ### When to Use BORG with mlr3 mlr3 provides built-in resamplings through `rsmp()`, including `"cv"`, `"repeated_cv"`, and `"holdout"`. For spatial or temporal data, the `mlr3spatiotempcv` extension package adds specialized resamplings. If you are already using `mlr3spatiotempcv`, BORG's main value is validation rather than generation. If you are not using that extension, BORG can generate properly blocked folds and inject them into mlr3 via `borg_to_mlr3()`. ### Validating mlr3 Tasks and Resamplings Pass an mlr3 task and the train/test indices from one fold to `borg_inspect()`: ```{r mlr3-example, eval = has_mlr3} library(mlr3) task <- TaskClassif$new("iris", iris, target = "Species") resampling <- rsmp("cv", folds = 5) resampling$instantiate(task) train_idx <- resampling$train_set(1) test_idx <- resampling$test_set(1) borg_inspect(task, train_idx, test_idx) ``` For spatial tasks, this check catches cases where random CV was used on autocorrelated data. You can iterate over all folds and aggregate the results if needed. ### Generating mlr3 Resamplings from BORG Folds `borg_to_mlr3()` converts BORG CV folds into a native mlr3 `Resampling` object. Internally, it uses `mlr3::rsmp("custom")` and pre-instantiates it with the BORG fold indices, so calling `$instantiate()` is not needed (and will not overwrite the existing splits). ```{r mlr3-generate, eval = has_mlr3} set.seed(42) sim <- borg_simulate(n = 200, type = "spatial", seed = 42) d <- sim$data cv <- borg_cv(d, coords = c("x", "y"), target = "y.1", v = 5) resampling <- borg_to_mlr3(cv_obj = cv, data = d) resampling$iters ``` The returned resampling can be used directly with `mlr3::resample()` and `mlr3::benchmark()`: ```{r mlr3-benchmark, eval = FALSE} task <- mlr3::TaskRegr$new("spatial", backend = d, target = "y.1") learner <- mlr3::lrn("regr.rpart") rr <- mlr3::resample(task, learner, resampling) rr$aggregate(mlr3::msr("regr.rmse")) ``` Note that `borg_to_mlr3()` accepts either a `cv_obj` (a `borg_cv` object) or raw `folds` and `data` arguments. When using `cv_obj`, the data argument is still required because `borg_cv` objects do not store the original data frame. ## Species Distribution Models Species distribution models (SDMs) are among the most common spatial modelling workflows in ecology, and they are particularly susceptible to spatial CV inflation. Two widely used R frameworks, ENMeval and biomod2, have their own partition formats. BORG can generate spatially blocked folds and convert them to either format. ### ENMeval ENMeval expects partitions as a named list with two elements: `occs.grp` (an integer vector assigning each occurrence point to a fold) and `bg.grp` (the same for background points). `borg_to_enmeval()` converts a `borg_cv` object to this format. ```{r enmeval-example} set.seed(42) sim <- borg_simulate(n = 200, type = "spatial", seed = 42) d <- sim$data cv <- borg_cv(d, coords = c("x", "y"), target = "y.1", v = 4) parts <- borg_to_enmeval(cv) table(parts$occs.grp) ``` The `bg.grp` vector is set to all zeros by default. If your ENMeval workflow uses background points drawn from a separate data frame, generate a second `borg_cv` object for those points and use its fold assignments for `bg.grp`. In a full ENMeval workflow, you would pass the partitions like this: ```{r enmeval-workflow, eval = FALSE} library(ENMeval) # Assuming occs and bg are your occurrence and background data frames, # and both have lon/lat columns. cv_occs <- borg_cv(occs, coords = c("lon", "lat"), v = 4) parts <- borg_to_enmeval(cv_occs) results <- ENMevaluate( occs = occs[, c("lon", "lat")], envs = env_rasters, bg = bg[, c("lon", "lat")], algorithm = "maxent.jar", partitions = "user", user.grp = parts ) ``` The key advantage over ENMeval's built-in partitioning methods (block, checkerboard) is that BORG uses autocorrelation-aware block sizes derived from `borg_diagnose()`. This means the block size adapts to the actual spatial structure in your data rather than relying on a fixed geometric partition. ### biomod2 biomod2 expects a `DataSplitTable`, a logical matrix where each column is a CV run and each row is an observation. Values are `TRUE` for calibration (training) and `FALSE` for validation (testing). `borg_to_biomod2()` produces exactly this format. ```{r biomod2-example} set.seed(42) sim <- borg_simulate(n = 200, type = "spatial", seed = 42) d <- sim$data cv <- borg_cv(d, coords = c("x", "y"), target = "y.1", v = 4) split_table <- borg_to_biomod2(cv) dim(split_table) head(split_table) ``` In a biomod2 workflow, pass this table to `BIOMOD_Modeling()` via the `data.split.table` argument: ```{r biomod2-workflow, eval = FALSE} library(biomod2) bm_data <- BIOMOD_FormatingData( resp.var = presence_vector, expl.var = env_stack, resp.xy = coords_matrix, resp.name = "species" ) bm_options <- BIOMOD_ModelingOptions() bm_models <- BIOMOD_Modeling( bm.format = bm_data, modeling.id = "borg_blocked", models = c("GLM", "RF", "MAXENT"), bm.options = bm_options, data.split.table = split_table, metric.eval = c("TSS", "ROC") ) ``` ## Temporal Data Workflows For time-series and panel data, the order of observations carries information. A model trained on future data and tested on past data will appear to perform well but fail in production. BORG enforces chronological ordering in several ways. ### Basic Temporal Validation When you provide a `time` column to `borg()`, it checks that no test observation precedes any training observation. If the split violates temporal order, BORG flags it as a hard violation. ```{r temporal-basic} set.seed(123) n <- 365 ts_data <- data.frame( date = seq(as.Date("2020-01-01"), by = "day", length.out = n), value = cumsum(rnorm(n)), feature = rnorm(n) ) train_idx <- 1:252 test_idx <- 253:365 result <- borg(ts_data, train_idx = train_idx, test_idx = test_idx, time = "date") result ``` ### Rolling Origin with rsample `rsample::rolling_origin()` creates expanding or sliding window splits that respect temporal order by construction. You can still validate these with `borg_inspect()` to confirm no leakage: ```{r rolling-origin, eval = has_rsample} rolling <- rolling_origin( data = ts_data, initial = 200, assess = 30, cumulative = FALSE ) borg_inspect(rolling, train_idx = NULL, test_idx = NULL) ``` ## Spatial Data Workflows Spatial autocorrelation means nearby observations share information not captured by the predictors. When train and test folds contain neighboring points, the model can "cheat" by memorizing local patterns rather than learning generalizable relationships. ### Spatial Block Validation ```{r spatial-basic} set.seed(456) n <- 200 spatial_data <- data.frame( lon = runif(n, -10, 10), lat = runif(n, -10, 10), response = rnorm(n), predictor = rnorm(n) ) train_idx <- which(spatial_data$lon < 0) test_idx <- which(spatial_data$lon >= 0) result <- borg(spatial_data, train_idx = train_idx, test_idx = test_idx, coords = c("lon", "lat")) result ``` ### Automatic Spatial CV Generation When you omit `train_idx` and `test_idx`, `borg()` generates spatially blocked folds automatically. The block size is calibrated to the range of spatial autocorrelation estimated by `borg_diagnose()`. ```{r spatial-auto} result <- borg(spatial_data, coords = c("lon", "lat"), target = "response", v = 5) result$diagnosis@recommended_cv length(result$folds) ``` ## Grouped Data Workflows For hierarchical data (patients, sites, species), observations within the same group are not independent. Splitting a patient's visits across train and test sets lets the model learn patient-specific patterns during training and exploit them during testing. BORG's group-aware CV keeps all observations from each group in the same fold. ```{r grouped-workflow} clinical_data <- data.frame( patient_id = rep(1:50, each = 4), visit = rep(1:4, times = 50), outcome = rnorm(200) ) result <- borg(clinical_data, groups = "patient_id", target = "outcome", v = 5) result$diagnosis@recommended_cv fold1 <- result$folds[[1]] train_patients <- unique(clinical_data$patient_id[fold1$train]) test_patients <- unique(clinical_data$patient_id[fold1$test]) length(intersect(train_patients, test_patients)) ``` The intersection is zero, confirming complete group separation. ## Complete Pipeline Validation ### borg_validate(): Workflow-Level Checks `borg_validate()` accepts a list with `data`, `train_idx`, and `test_idx` components and checks for index overlap, duplicate rows, and other structural problems. It is framework-agnostic and works with any workflow that can be described as a data frame plus index vectors. ```{r pipeline-validation} data <- iris set.seed(789) n <- nrow(data) train_idx <- sample(n, 0.7 * n) test_idx <- setdiff(1:n, train_idx) result <- borg_validate(list( data = data, train_idx = train_idx, test_idx = test_idx )) result ``` ### Catching Overlap A common mistake is constructing indices that share rows between train and test sets. This happens, for example, when using `sample()` without `setdiff()`, or when off-by-one errors corrupt index boundaries. ```{r pipeline-bad} bad_workflow <- list( data = iris, train_idx = 1:100, test_idx = 51:150 ) result <- borg_validate(bad_workflow) result ``` ### borg_pipeline(): Stage-by-Stage Decomposition For richer pipelines (a fitted `caret::train` object, a tidymodels `workflow`, or a list of named components), `borg_pipeline()` decomposes the pipeline into stages and inspects each: 1. **Preprocessing**: recipe steps, `preProcess`, PCA, scaling 2. **Feature selection**: variable importance filtering 3. **Hyperparameter tuning**: inner CV resamples 4. **Model fitting**: training data scope and row counts 5. **Post-processing**: threshold optimization, calibration Each stage receives its own BorgRisk assessment. The overall result aggregates all risks across stages, and the `leaking_stages` field lists the names of stages with hard violations. ## Automatic Repair with borg_assimilate() `borg_assimilate()` attempts to fix detected leakage automatically. It can handle cases like recipes prepped on the full data (by re-prepping on training data only) or preprocessing objects fitted on too many rows. Some problems, like index overlap, require a decision about which indices to keep and cannot be fixed automatically. ```{r assimilate} workflow <- list( data = iris, train_idx = 1:100, test_idx = 51:150 ) fixed <- borg_assimilate(workflow) if (length(fixed$unfixable) > 0) { cat("Partial assimilation:", length(fixed$unfixable), "risk(s) require manual fix:", paste(fixed$unfixable, collapse = ", "), "\n") } else { cat("Assimilation complete:", length(fixed$fixed), "risk(s) corrected\n") } ``` Index overlap requires choosing a new split strategy and cannot be resolved without user input. ## Migration Checklist When migrating an existing modelling pipeline to use BORG-guarded CV, follow these steps: 1. **Identify the CV step.** Find where your code calls `vfold_cv()`, `trainControl()`, `rsmp("cv")`, or builds manual index vectors. 2. **Add structure hints.** Determine which columns carry spatial coordinates, time stamps, or group identifiers, and pass them to the corresponding BORG wrapper. 3. **Replace the CV constructor.** Swap `vfold_cv()` for `borg_vfold_cv()`, `trainControl()` for `borg_trainControl()`, or generate folds with `borg_cv()` and inject them via `output = "rsample"`, `"caret"`, or `"mlr3"`. 4. **Validate preprocessing.** Run `borg_inspect()` on any prepped recipe or `preProcess` object to confirm it was fitted on training data only. 5. **Run borg_pipeline() on the fitted model.** This catches leakage in stages you may have overlooked (nested CV, threshold selection, post-processing). 6. **Compare metrics.** If your blocked CV metrics are substantially lower than the random CV metrics, the difference is the inflation that was hiding in your original evaluation. This is the real performance of your model. ## See Also - `vignette("quickstart")` for basic usage and concepts - `vignette("risk-taxonomy")` for the complete catalog of detectable risks