Runtime Contracts for R Functions

library(restrictR)

Overview

restrictR lets you define reusable input contracts from small building blocks using the base pipe |>. A contract is defined once and called like a function to validate data at runtime. Validators are immutable: each |> returns a new validator, so you can safely branch from a shared base without side effects.

Section	What you’ll learn
Reusable schemas	Define and reuse data.frame contracts
Dependent validation	Constraints that reference other arguments
Enum arguments	Restrict string arguments to a fixed set
Arbitrary classes	Validate factors, dates, and model objects
Data frame with mixed constraints	Columns + enums + ranges in one contract
Checking without stopping	Collect all failures; test without throwing
Custom steps	Domain-specific invariants
Self-documentation	Print, `as_contract_text()`, `as_contract_block()`
Using contracts in packages	The recommended pattern for R packages

Reusable Schemas

The most common use case: validating a newdata argument in a predict-like function. Instead of scattering if/stop() blocks, define the contract once:

require_newdata <- restrict("newdata") |>
  require_df() |>
  require_has_cols(c("x1", "x2")) |>
  require_col_numeric("x1", no_na = TRUE, finite = TRUE) |>
  require_col_numeric("x2", no_na = TRUE, finite = TRUE) |>
  require_nrow_min(1L)

The result is a callable function. Valid input passes silently:

good <- data.frame(x1 = c(1, 2, 3), x2 = c(4, 5, 6))
require_newdata(good)

Invalid input produces a structured error with the exact path and position:

require_newdata(42)
#> Error:
#> ! newdata: must be a data.frame, got numeric

require_newdata(data.frame(x1 = c(1, NA), x2 = c(3, 4)))
#> Error:
#> ! newdata$x1: must not contain NA
#>   At: 2

require_newdata(data.frame(x1 = c(1, 2), x2 = c("a", "b")))
#> Error:
#> ! newdata$x2: must be numeric, got character

Every error follows the same format: path: message, optionally followed by Found: and At: lines. This makes errors instantly recognizable and grep-friendly.

Dependent Validation

Some contracts depend on context. A prediction vector must have the same length as the rows in newdata:

require_pred <- restrict("pred") |>
  require_numeric(no_na = TRUE, finite = TRUE) |>
  require_length_matches(~ nrow(newdata))

The formula ~ nrow(newdata) declares a dependency on newdata. Pass it explicitly when calling the validator:

newdata <- data.frame(x1 = 1:5, x2 = 6:10)
require_pred(c(0.1, 0.2, 0.3, 0.4, 0.5), newdata = newdata)

Mismatched lengths produce a precise diagnostic:

require_pred(c(0.1, 0.2, 0.3), newdata = newdata)
#> Error:
#> ! pred: length must match nrow(newdata) (5)
#>   Found: length 3

Missing context is caught before any checks run:

require_pred(c(0.1, 0.2, 0.3))
#> Error:
#> ! `pred` depends on: newdata. Pass newdata = ... when calling the validator.

Context can also be passed as a named list via .ctx:

require_pred(1:5, .ctx = list(newdata = newdata))

Enum Arguments

For string arguments that must be one of a fixed set:

require_method <- restrict("method") |>
  require_character(no_na = TRUE) |>
  require_length(1L) |>
  require_one_of(c("euclidean", "manhattan", "cosine"))

require_method("euclidean")

require_method("chebyshev")
#> Error:
#> ! method: must be one of ["euclidean", "manhattan", "cosine"]
#>   Found: "chebyshev"

Arbitrary Classes

require_class() covers types without a dedicated check, such as factors, dates, and fitted-model objects. By default it tests inheritance, so a subclass passes; set exact = TRUE to require the first class exactly.

require_event <- restrict("event") |>
  require_class("Date")

require_event(as.Date("2026-01-01"))

require_event("2026-01-01")
#> Error:
#> ! event: must be of class "Date", got character

Data Frame with Mixed Constraints

Contracts work well for functions that accept a data frame with typed columns, value ranges, and categorical fields in one go:

require_survey <- restrict("survey") |>
  require_df() |>
  require_has_cols(c("age", "income", "status")) |>
  require_col_numeric("age", no_na = TRUE) |>
  require_col_between("age", lower = 0, upper = 150) |>
  require_col_numeric("income", no_na = TRUE, finite = TRUE) |>
  require_col_one_of("status", c("active", "inactive", "pending"))

good_survey <- data.frame(
  age = c(25, 40, 33),
  income = c(35000, 60000, 45000),
  status = c("active", "inactive", "active")
)
require_survey(good_survey)

bad_survey <- data.frame(
  age = c(25, -5, 200),
  income = c(35000, 60000, 45000),
  status = c("active", "inactive", "active")
)
require_survey(bad_survey)
#> Error:
#> ! survey$age: must be >= 0 and <= 150
#>   Found: -5
#>   At: 2, 3

Checking Without Stopping

By default a validator stops at the first failing step. Pass .on_fail = "all" to run every step and collect all violations in one report:

messy_survey <- data.frame(
  age = c(25, -5, 200),
  income = c(35000, NA, 45000),
  status = c("active", "banned", "active")
)
require_survey(messy_survey, .on_fail = "all")
#> Error:
#> ! 3 validation failures:
#> survey$age: must be >= 0 and <= 150
#>   Found: -5
#>   At: 2, 3
#> survey$income: must not contain NA
#>   At: 2
#> survey$status: must be one of ["active", "inactive", "pending"]
#>   Found: "banned"
#>   At: 2

To branch on validity in code, is_valid() returns a logical and validation_errors() returns the messages as a character vector, empty when the value passes:

is_valid(require_survey, good_survey)
#> [1] TRUE
validation_errors(require_survey, messy_survey)
#> [1] "survey$age: must be >= 0 and <= 150\n  Found: -5\n  At: 2, 3"                                       
#> [2] "survey$income: must not contain NA\n  At: 2"                                                        
#> [3] "survey$status: must be one of [\"active\", \"inactive\", \"pending\"]\n  Found: \"banned\"\n  At: 2"

Custom Steps

For domain-specific invariants that don’t belong in the built-in set, use require_custom(). The step function receives (value, name, ctx) and should call fail() on failure to produce the same structured errors as built-in steps:

require_weights <- restrict("weights") |>
  require_numeric(no_na = TRUE) |>
  require_between(lower = 0, upper = 1) |>
  require_custom(
    label = "must sum to 1",
    fn = function(value, name, ctx) {
      if (abs(sum(value) - 1) > 1e-8) {
        fail(name, "must sum to 1",
             found = sprintf("sum = %g", sum(value)))
      }
    }
  )

require_weights(c(0.5, 0.3, 0.2))

require_weights(c(0.5, 0.5, 0.5))
#> Error:
#> ! weights: must sum to 1
#>   Found: sum = 1.5

Custom steps can also declare dependencies:

require_probs <- restrict("probs") |>
  require_numeric(no_na = TRUE) |>
  require_custom(
    label = "length must match number of classes",
    deps = "n_classes",
    fn = function(value, name, ctx) {
      if (length(value) != ctx$n_classes) {
        fail(name, sprintf("expected %d probabilities", ctx$n_classes),
             found = sprintf("length %d", length(value)))
      }
    }
  )

require_probs(c(0.3, 0.7), n_classes = 2L)

Self-Documentation

Print a validator to see its full contract:

require_newdata
#> <restriction newdata>
#>   1. must be a data.frame
#>   2. must have columns: "x1", "x2"
#>   3. $x1 must be numeric (no NA, finite)
#>   4. $x2 must be numeric (no NA, finite)
#>   5. must have at least 1 row

Use as_contract_text() to generate a one-line summary for roxygen @param:

as_contract_text(require_newdata)
#> [1] "Must be a data.frame. Must have columns: \"x1\", \"x2\". $x1 must be numeric (no NA, finite). $x2 must be numeric (no NA, finite). Must have at least 1 row."

Use as_contract_block() for multi-line output suitable for @details:

cat(as_contract_block(require_newdata))
#> - must be a data.frame
#> - must have columns: "x1", "x2"
#> - $x1 must be numeric (no NA, finite)
#> - $x2 must be numeric (no NA, finite)
#> - must have at least 1 row

Using Contracts in Packages

The recommended pattern: define contracts near the top of the file that uses them, or in a dedicated R/contracts.R if several files share the same validators. Call them at the top of exported functions.

# R/contracts.R
require_newdata <- restrict("newdata") |>
  require_df() |>
  require_has_cols(c("x1", "x2")) |>
  require_col_numeric("x1", no_na = TRUE, finite = TRUE) |>
  require_col_numeric("x2", no_na = TRUE, finite = TRUE)

require_pred <- restrict("pred") |>
  require_numeric(no_na = TRUE, finite = TRUE) |>
  require_length_matches(~ nrow(newdata))

# R/predict.R

#' Predict from a fitted model
#'
#' @param newdata Must be a data.frame. Must have columns: "x1", "x2". $x1 must be numeric (no NA, finite). $x2 must be numeric (no NA, finite). Must have at least 1 row.
#' @param ... additional arguments passed to the underlying model.
#'
#' @export
my_predict <- function(object, newdata, ...) {
  require_newdata(newdata)
  pred <- do_prediction(object, newdata)
  require_pred(pred, newdata = newdata)
  pred
}

Contracts compose naturally with the pipe and branch safely (each |> creates a new validator):

base <- restrict("x") |> require_numeric()
v1 <- base |> require_length(1L)
v2 <- base |> require_between(lower = 0)

# base is unchanged
length(environment(base)$steps)
#> [1] 1
length(environment(v1)$steps)
#> [1] 2
length(environment(v2)$steps)
#> [1] 2

Relation to checkmate

The checkmate package covers similar ground with a different emphasis. Its assert*/check*/test* families provide fast, C-backed checks that you call inline, one per argument, where the check is needed. restrictR instead lets you name a contract once as a |> chain and reuse that callable validator across functions, compose and branch it immutably, and have it print and document itself via as_contract_text(). Use checkmate when you want quick inline assertions; use restrictR when the same contract recurs across functions and you want a single definition that also serves as documentation. The two interoperate: a checkmate assertion can live inside a require_custom() step.

sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] restrictR_0.2.0 rmarkdown_2.31 
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     R6_2.6.1          fastmap_1.2.0     xfun_0.59        
#>  [5] maketools_1.3.2   cachem_1.1.0      knitr_1.51        htmltools_0.5.9  
#>  [9] buildtools_1.0.0  lifecycle_1.0.5   cli_3.6.6         svglite_2.2.2    
#> [13] sass_0.4.10       textshaping_1.0.5 jquerylib_0.1.4   systemfonts_1.3.2
#> [17] compiler_4.6.1    sys_3.4.3         tools_4.6.1       evaluate_1.0.5   
#> [21] bslib_0.11.0      yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
#> [25] rlang_1.2.0