| Title: | Optimal Pairing and Matching via Linear Assignment |
|---|---|
| Description: | Solves optimal pairing and matching problems using linear assignment algorithms. Provides implementations of the Hungarian method (Kuhn 1955) <doi:10.1002/nav.3800020109>, Jonker-Volgenant shortest path algorithm (Jonker and Volgenant 1987) <doi:10.1007/BF02278710>, Auction algorithm (Bertsekas 1988) <doi:10.1007/BF02186476>, cost-scaling (Goldberg and Kennedy 1995) <doi:10.1007/BF01585996>, scaling algorithms (Gabow and Tarjan 1989) <doi:10.1137/0218069>, push-relabel (Goldberg and Tarjan 1988) <doi:10.1145/48014.61051>, and Sinkhorn entropy-regularized transport (Cuturi 2013) <doi:10.48550/arxiv.1306.0895>. Designed for matching plots, sites, samples, or any pairwise optimization problem. Supports rectangular matrices, forbidden assignments, data frame inputs, batch solving, k-best solutions, and pixel-level image morphing for visualization. Includes automatic preprocessing with variable health checks, multiple scaling methods (standardized, range, robust), greedy matching algorithms, and comprehensive balance diagnostics for assessing match quality using standardized differences and distribution comparisons. |
| Authors: | Gilles Colling [aut, cre, cph] (ORCID: <https://orcid.org/0000-0003-3070-6066>) |
| Maintainer: | Gilles Colling <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.4.1 |
| Built: | 2026-05-29 23:14:57 UTC |
| Source: | https://github.com/gcol33/couplr |
Returns the character vector of method strings for which a trace function
has been registered. Use any of these with lap_animate().
animated_methods()animated_methods()
Character vector of registered method names.
Turns a tidy assignment result back into a 0/1 assignment matrix.
as_assignment_matrix(x, n_sources = NULL, n_targets = NULL)as_assignment_matrix(x, n_sources = NULL, n_targets = NULL)
x |
An assignment result object of class |
n_sources |
Number of source nodes, optional |
n_targets |
Number of target nodes, optional |
Integer matrix with 0 and 1 entries
Constructs a matchit-class S3 object from a couplr result, enabling
use with any function that accepts MatchIt objects (e.g.,
cobalt, marginaleffects).
as_matchit( result, left, right, formula = NULL, left_id = "id", right_id = "id", ... )as_matchit( result, left, right, formula = NULL, left_id = "id", right_id = "id", ... )
result |
A couplr result object (matching_result, full_matching_result, cem_result, or subclass_result) |
left |
Data frame of left (treated) units |
right |
Data frame of right (control) units |
formula |
Optional formula used for matching. If not provided, a
default formula is constructed from |
left_id |
Name of ID column in left (default: |
right_id |
Name of ID column in right (default: |
... |
Additional arguments (ignored) |
An S3 object of class "matchit" with fields:
Match matrix (treated x controls)
Named treatment vector (1/0)
Matching weights
Covariate matrix
Original call
Metadata from couplr
## Not run: left <- data.frame(id = 1:5, age = c(25, 35, 45, 55, 65)) right <- data.frame(id = 6:15, age = runif(10, 20, 70)) result <- match_couples(left, right, vars = "age") mi <- as_matchit(result, left, right) # Now use with cobalt: cobalt::bal.tab(mi) ## End(Not run)## Not run: left <- data.frame(id = 1:5, age = c(25, 35, 45, 55, 65)) right <- data.frame(id = 6:15, age = runif(10, 20, 70)) result <- match_couples(left, right, vars = "age") mi <- as_matchit(result, left, right) # Now use with cobalt: cobalt::bal.tab(mi) ## End(Not run)
Solve the linear assignment problem (minimum- or maximum-cost matching)
using several algorithms. Forbidden edges can be marked as NA or Inf.
assignment( cost, maximize = FALSE, method = c("auto", "jv", "hungarian", "munkres", "auction", "auction_gs", "auction_scaled", "sap", "ssp", "csflow", "hk01", "bruteforce", "ssap_bucket", "cycle_cancel", "gabow_tarjan", "lapmod", "csa", "ramshaw_tarjan", "push_relabel", "orlin", "network_simplex"), auction_eps = NULL, eps = NULL )assignment( cost, maximize = FALSE, method = c("auto", "jv", "hungarian", "munkres", "auction", "auction_gs", "auction_scaled", "sap", "ssp", "csflow", "hk01", "bruteforce", "ssap_bucket", "cycle_cancel", "gabow_tarjan", "lapmod", "csa", "ramshaw_tarjan", "push_relabel", "orlin", "network_simplex"), auction_eps = NULL, eps = NULL )
cost |
Numeric matrix; rows = tasks, columns = agents. |
maximize |
Logical; if |
method |
Character string indicating the algorithm to use. Options: General-purpose solvers:
Auction-based solvers:
Specialized solvers:
Advanced solvers:
|
auction_eps |
Optional numeric epsilon for the 'Auction'/'Auction-GS' methods.
If |
eps |
Deprecated. Use |
method = "auto" selects an algorithm based on problem size/shape and data
characteristics:
Very small (n <= 8): "bruteforce" — exact enumeration
Binary/constant costs: "hk01" — specialized for 0/1 costs
Large sparse (n>100, >50\
Sparse or very rectangular: "sap" — handles sparsity well
Small-medium (8 < n <= 50): "hungarian" — provides exact dual solutions
Medium (50 < n <= 75): "jv" — fast general-purpose solver
Large (n>75): "auction_scaled" — fastest for large dense problems
Benchmarks show 'Auction-scaled' and 'JV' are 100-1500x faster than 'Hungarian' at n=500.
An object of class lap_solve_result, a list with elements:
match — integer vector of length min(nrow(cost), ncol(cost))
giving the assigned column for each row (0 if unassigned).
total_cost — numeric scalar, the objective value.
status — character scalar, e.g. "optimal".
method_used — character scalar, the algorithm actually used.
lap_solve() — Tidy interface returning tibbles
lap_solve_kbest() — Find k-best assignments ('Murty' algorithm)
assignment_duals() — Extract dual variables for sensitivity analysis
bottleneck_assignment() — Minimize maximum edge cost (minimax)
sinkhorn() — Entropy-regularized optimal transport
cost <- matrix(c(4,2,5, 3,3,6, 7,5,4), nrow = 3, byrow = TRUE) res <- assignment(cost) res$match; res$total_costcost <- matrix(c(4,2,5, 3,3,6, 7,5,4), nrow = 3, byrow = TRUE) res <- assignment(cost) res$match; res$total_cost
Solves the linear assignment problem and returns dual potentials (u, v) in addition to the optimal matching. The dual variables provide an optimality certificate and enable sensitivity analysis.
assignment_duals(cost, maximize = FALSE)assignment_duals(cost, maximize = FALSE)
cost |
Numeric matrix; rows = tasks, columns = agents. |
maximize |
Logical; if |
The dual variables satisfy the complementary slackness conditions:
For minimization: u[i] + v[j] <= cost[i,j] for all (i,j)
For any assigned pair (i,j): u[i] + v[j] = cost[i,j]
This implies that sum(u) + sum(v) = total_cost (strong duality).
Applications of dual variables:
Optimality verification: Check that duals satisfy constraints
Sensitivity analysis: Reduced cost c[i,j] - u[i] - v[j] shows
how much an edge cost must decrease before it enters the solution
Pricing in column generation: Use duals to price new columns
Warm starting: Reuse duals when costs change slightly
A list with class "assignment_duals_result" containing:
match - integer vector of column assignments (1-based)
total_cost - optimal objective value
u - numeric vector of row dual variables (length n)
v - numeric vector of column dual variables (length m)
status - character, e.g. "optimal"
assignment() for standard assignment without duals
cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3, byrow = TRUE) result <- assignment_duals(cost) # Check optimality: u + v should equal cost for assigned pairs for (i in 1:3) { j <- result$match[i] cat(sprintf("Row %d -> Col %d: u + v = %.2f, cost = %.2f\n", i, j, result$u[i] + result$v[j], cost[i, j])) } # Verify strong duality cat("sum(u) + sum(v) =", sum(result$u) + sum(result$v), "\n") cat("total_cost =", result$total_cost, "\n") # Reduced costs (how much must cost decrease to enter solution) reduced <- outer(result$u, result$v, "+") reduced_cost <- cost - reduced print(round(reduced_cost, 2))cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3, byrow = TRUE) result <- assignment_duals(cost) # Check optimality: u + v should equal cost for assigned pairs for (i in 1:3) { j <- result$match[i] cat(sprintf("Row %d -> Col %d: u + v = %.2f, cost = %.2f\n", i, j, result$u[i] + result$v[j], cost[i, j])) } # Verify strong duality cat("sum(u) + sum(v) =", sum(result$u) + sum(result$v), "\n") cat("total_cost =", result$total_cost, "\n") # Reduced costs (how much must cost decrease to enter solution) reduced <- outer(result$u, result$v, "+") reduced_cost <- cost - reduced print(round(reduced_cost, 2))
S3 generic for augmenting model results with original data.
augment(x, ...)augment(x, ...)
x |
An object to augment |
... |
Additional arguments passed to methods |
Augmented data (depends on method)
S3 method for augmenting matching results following the broom package
conventions. This is a thin wrapper around join_matched() with
sensible defaults for quick exploration.
## S3 method for class 'matching_result' augment(x, left, right, ...)## S3 method for class 'matching_result' augment(x, left, right, ...)
x |
A matching_result object |
left |
The original left dataset |
right |
The original right dataset |
... |
Additional arguments passed to |
This method follows the augment() convention from the broom package,
making it easy to integrate couplr into tidymodels workflows. It's
equivalent to calling join_matched() with default parameters.
If the broom package is not loaded, you can use couplr::augment()
to access this function.
A tibble with matched pairs and original data (see join_matched())
left <- data.frame( id = 1:5, treatment = 1, age = c(25, 30, 35, 40, 45) ) right <- data.frame( id = 6:10, treatment = 0, age = c(24, 29, 36, 41, 44) ) result <- match_couples(left, right, vars = "age") couplr::augment(result, left, right)left <- data.frame( id = 1:5, treatment = 1, age = c(25, 30, 35, 40, 45) ) right <- data.frame( id = 6:10, treatment = 0, age = c(24, 29, 36, 41, 44) ) result <- match_couples(left, right, vars = "age") couplr::augment(result, left, right)
Produces ggplot2-based balance assessment plots. Returns a ggplot object.
## S3 method for class 'balance_diagnostics' autoplot( object, type = c("love", "histogram", "variance"), threshold = 0.1, ... )## S3 method for class 'balance_diagnostics' autoplot( object, type = c("love", "histogram", "variance"), threshold = 0.1, ... )
object |
A balance_diagnostics object |
type |
Type of plot: "love" (default), "histogram", or "variance" |
threshold |
Threshold for standardized differences (default: 0.1) |
... |
Additional arguments (ignored) |
A ggplot object
if (requireNamespace("ggplot2", quietly = TRUE)) { set.seed(42) left <- data.frame(id = 1:10, age = rnorm(10, 45, 10), income = rnorm(10, 50000, 15000)) right <- data.frame(id = 11:30, age = rnorm(20, 47, 10), income = rnorm(20, 52000, 15000)) result <- match_couples(left, right, vars = c("age", "income")) bal <- balance_diagnostics(result, left, right, vars = c("age", "income")) ggplot2::autoplot(bal) }if (requireNamespace("ggplot2", quietly = TRUE)) { set.seed(42) left <- data.frame(id = 1:10, age = rnorm(10, 45, 10), income = rnorm(10, 50000, 15000)) right <- data.frame(id = 11:30, age = rnorm(20, 47, 10), income = rnorm(20, 52000, 15000)) result <- match_couples(left, right, vars = c("age", "income")) bal <- balance_diagnostics(result, left, right, vars = c("age", "income")) ggplot2::autoplot(bal) }
Produces ggplot2-based visualizations of matching distance distributions. Returns a ggplot object that can be further customized.
## S3 method for class 'matching_result' autoplot(object, type = c("histogram", "density", "ecdf"), ...)## S3 method for class 'matching_result' autoplot(object, type = c("histogram", "density", "ecdf"), ...)
object |
A matching_result object |
type |
Type of plot: "histogram" (default), "density", or "ecdf" |
... |
Additional arguments (ignored) |
Use plot() for base graphics or autoplot() for ggplot2 output.
The ggplot2 package must be installed.
A ggplot object
if (requireNamespace("ggplot2", quietly = TRUE)) { left <- data.frame(id = 1:5, x = c(1, 2, 3, 4, 5)) right <- data.frame(id = 6:10, x = c(1.1, 2.2, 3.1, 4.2, 5.1)) result <- match_couples(left, right, vars = "x") ggplot2::autoplot(result) ggplot2::autoplot(result, type = "density") }if (requireNamespace("ggplot2", quietly = TRUE)) { left <- data.frame(id = 1:5, x = c(1, 2, 3, 4, 5)) right <- data.frame(id = 6:10, x = c(1.1, 2.2, 3.1, 4.2, 5.1)) result <- match_couples(left, right, vars = "x") ggplot2::autoplot(result) ggplot2::autoplot(result, type = "density") }
Plots p-value upper bounds against sensitivity parameter Gamma.
## S3 method for class 'sensitivity_analysis' autoplot(object, alpha = 0.05, ...)## S3 method for class 'sensitivity_analysis' autoplot(object, alpha = 0.05, ...)
object |
A sensitivity_analysis object |
alpha |
Significance level (default: 0.05) |
... |
Additional arguments (ignored) |
A ggplot object
S3 method enabling cobalt::bal.tab() on couplr result objects.
Requires the cobalt package to be installed.
bal.tab.matching_result(x, left, right, ...) bal.tab.full_matching_result(x, left, right, ...) bal.tab.cem_result(x, left, right, ...) bal.tab.subclass_result(x, data = NULL, ...)bal.tab.matching_result(x, left, right, ...) bal.tab.full_matching_result(x, left, right, ...) bal.tab.cem_result(x, left, right, ...) bal.tab.subclass_result(x, data = NULL, ...)
x |
A couplr result object |
left |
Data frame of left (treated) units |
right |
Data frame of right (control) units |
... |
Additional arguments passed to |
data |
Data frame used for subclassification (for subclass_result only) |
These methods convert couplr results to the format cobalt expects
(a matchit-class object) and then delegate to cobalt's own
bal.tab.matchit() method. The cobalt package must be
installed but is not required for couplr to function.
A cobalt balance table object
Computes comprehensive balance statistics comparing the distribution of matching variables between left and right units in the matched sample.
balance_diagnostics(result, ...) ## S3 method for class 'matching_result' balance_diagnostics( result, left, right, vars = NULL, left_id = "id", right_id = "id", ... ) ## S3 method for class 'full_matching_result' balance_diagnostics( result, left, right, vars = NULL, left_id = "id", right_id = "id", ... ) ## S3 method for class 'cem_result' balance_diagnostics( result, left, right, vars = NULL, left_id = "id", right_id = "id", ... ) ## S3 method for class 'subclass_result' balance_diagnostics(result, data = NULL, vars = NULL, ...)balance_diagnostics(result, ...) ## S3 method for class 'matching_result' balance_diagnostics( result, left, right, vars = NULL, left_id = "id", right_id = "id", ... ) ## S3 method for class 'full_matching_result' balance_diagnostics( result, left, right, vars = NULL, left_id = "id", right_id = "id", ... ) ## S3 method for class 'cem_result' balance_diagnostics( result, left, right, vars = NULL, left_id = "id", right_id = "id", ... ) ## S3 method for class 'subclass_result' balance_diagnostics(result, data = NULL, vars = NULL, ...)
result |
A matching result object from |
... |
Additional arguments passed to methods |
left |
Data frame of left units |
right |
Data frame of right units |
vars |
Character vector of variable names to check balance for. Defaults to the variables used in matching (if available in result). |
left_id |
Character, name of ID column in left data (default: "id") |
right_id |
Character, name of ID column in right data (default: "id") |
data |
Data frame used for subclassification (when |
This function computes several balance metrics:
Standardized Difference: The difference in means divided by the pooled standard deviation. Values less than 0.1 indicate excellent balance, 0.1-0.25 good balance.
Variance Ratio: The ratio of standard deviations (left/right). Values close to 1 are ideal.
KS Statistic: Kolmogorov-Smirnov test statistic comparing distributions. Lower values indicate more similar distributions.
Overall Metrics include mean absolute standardized difference across all variables, proportion of variables with large imbalance (|std diff| > 0.25), and maximum standardized difference.
An S3 object of class balance_diagnostics containing:
Tibble with per-variable balance statistics
List with overall balance metrics
Tibble of matched pairs with variables
Number of matched pairs
Number of unmatched left units
Number of unmatched right units
Matching method used
Whether blocking was used
Per-block statistics (if blocking used)
# Create sample data set.seed(123) left <- data.frame( id = 1:10, age = rnorm(10, 45, 10), income = rnorm(10, 50000, 15000) ) right <- data.frame( id = 11:30, age = rnorm(20, 47, 10), income = rnorm(20, 52000, 15000) ) # Match result <- match_couples(left, right, vars = c("age", "income")) # Get balance diagnostics balance <- balance_diagnostics(result, left, right, vars = c("age", "income")) print(balance) # Get balance table balance_table(balance)# Create sample data set.seed(123) left <- data.frame( id = 1:10, age = rnorm(10, 45, 10), income = rnorm(10, 50000, 15000) ) right <- data.frame( id = 11:30, age = rnorm(20, 47, 10), income = rnorm(20, 52000, 15000) ) # Match result <- match_couples(left, right, vars = c("age", "income")) # Get balance diagnostics balance <- balance_diagnostics(result, left, right, vars = c("age", "income")) print(balance) # Get balance table balance_table(balance)
Formats balance diagnostics into a clean table for display or export.
balance_table(balance, digits = 3)balance_table(balance, digits = 3)
balance |
A balance_diagnostics object from |
digits |
Number of decimal places for rounding (default: 3) |
A tibble with formatted balance statistics
Finds an assignment that minimizes (or maximizes) the maximum edge cost in a perfect matching. Unlike standard LAP which minimizes the sum of costs, BAP minimizes the maximum (bottleneck) cost.
bottleneck_assignment(cost, maximize = FALSE)bottleneck_assignment(cost, maximize = FALSE)
cost |
Numeric matrix; rows = tasks, columns = agents. |
maximize |
Logical; if |
The Bottleneck Assignment Problem (BAP) is a variant of the Linear Assignment Problem where instead of minimizing the sum of assignment costs, we minimize the maximum cost among all assignments (minimax objective).
Algorithm: Uses binary search on the sorted unique costs combined with 'Hopcroft-Karp' bipartite matching to find the minimum threshold that allows a perfect matching.
Complexity: O(E * sqrt(V) * log(unique costs)) where E = edges, V = vertices.
Applications:
Task scheduling with deadline constraints (minimize latest completion)
Resource allocation (minimize maximum load/distance)
Network routing (minimize maximum link utilization)
Fair division problems (minimize maximum disparity)
A list with class "bottleneck_result" containing:
match - integer vector of length nrow(cost) giving the
assigned column for each row (1-based indexing)
bottleneck - numeric scalar, the bottleneck (max/min edge) value
status - character scalar, e.g. "optimal"
assignment() for standard LAP (sum objective), lap_solve() for
tidy LAP interface
# Simple example: minimize max cost cost <- matrix(c(1, 5, 3, 2, 4, 6, 7, 1, 2), nrow = 3, byrow = TRUE) result <- bottleneck_assignment(cost) result$bottleneck # Maximum edge cost in optimal assignment # Maximize minimum (fair allocation) profits <- matrix(c(10, 5, 8, 6, 12, 4, 3, 7, 11), nrow = 3, byrow = TRUE) result <- bottleneck_assignment(profits, maximize = TRUE) result$bottleneck # Minimum profit among all assignments # With forbidden assignments cost <- matrix(c(1, NA, 3, 2, 4, Inf, 5, 1, 2), nrow = 3, byrow = TRUE) result <- bottleneck_assignment(cost)# Simple example: minimize max cost cost <- matrix(c(1, 5, 3, 2, 4, 6, 7, 1, 2), nrow = 3, byrow = TRUE) result <- bottleneck_assignment(cost) result$bottleneck # Maximum edge cost in optimal assignment # Maximize minimum (fair allocation) profits <- matrix(c(10, 5, 8, 6, 12, 4, 3, 7, 11), nrow = 3, byrow = TRUE) result <- bottleneck_assignment(profits, maximize = TRUE) result$bottleneck # Minimum profit among all assignments # With forbidden assignments cost <- matrix(c(1, NA, 3, 2, 4, Inf, 5, 1, 2), nrow = 3, byrow = TRUE) result <- bottleneck_assignment(cost)
Maximizes the number of matched pairs subject to balance constraints. Uses iterative pruning: starts with a full match, then removes pairs that contribute most to imbalance until all variables satisfy the standardized difference threshold.
cardinality_match( left, right, vars, max_std_diff = 0.1, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, method = "auto", max_iter = 100L, batch_fraction = 0.1 )cardinality_match( left, right, vars, max_std_diff = 0.1, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, method = "auto", max_iter = 100L, batch_fraction = 0.1 )
left |
Data frame of "left" units |
right |
Data frame of "right" units |
vars |
Character vector of matching variable names |
max_std_diff |
Maximum allowed absolute standardized difference (default: 0.1, corresponding to "excellent" balance) |
distance |
Distance metric (default: "euclidean") |
weights |
Optional named vector of variable weights |
scale |
Scaling method (default: FALSE) |
auto_scale |
If TRUE, automatically select scaling (default: FALSE) |
method |
LAP solver method (default: "auto") |
max_iter |
Maximum pruning iterations (default: 100) |
batch_fraction |
Fraction of worst pairs to remove per iteration (default: 0.1). Larger values speed up convergence but may over-prune. |
Cardinality matching (Zubizarreta 2014) finds the largest matched sample that satisfies pre-specified balance constraints. This implementation uses an iterative pruning heuristic:
Run full optimal matching via match_couples()
Compute balance diagnostics
While any |std_diff| exceeds max_std_diff:
Identify the variable with worst balance
Remove the batch of pairs contributing most to that imbalance
Recompute balance
A matching_result object with additional cardinality info:
result$info$pruning_iterations, result$info$pairs_removed,
result$info$final_balance.
set.seed(42) left <- data.frame(id = 1:20, x = rnorm(20, 0, 1), y = rnorm(20, 0, 1)) right <- data.frame(id = 21:50, x = rnorm(30, 0.5, 1), y = rnorm(30, 0.3, 1)) result <- cardinality_match(left, right, vars = c("x", "y"), max_std_diff = 0.2) print(result)set.seed(42) left <- data.frame(id = 1:20, x = rnorm(20, 0, 1), y = rnorm(20, 0, 1)) right <- data.frame(id = 21:50, x = rnorm(30, 0.5, 1), y = rnorm(30, 0.3, 1)) result <- cardinality_match(left, right, vars = c("x", "y"), max_std_diff = 0.2) print(result)
Coarsens continuous variables into bins, then performs exact matching on the coarsened values. Units in strata containing both left and right units are kept; others are pruned. Matched units receive weights inversely proportional to stratum sizes to maintain balance.
cem_match( left, right, vars, cutpoints = NULL, n_bins = "sturges", grouping = NULL, keep = "all", left_id = "id", right_id = "id" )cem_match( left, right, vars, cutpoints = NULL, n_bins = "sturges", grouping = NULL, keep = "all", left_id = "id", right_id = "id" )
left |
Data frame of left (treated) units |
right |
Data frame of right (control) units |
vars |
Character vector of variable names to coarsen and match on |
cutpoints |
Named list of break vectors per variable. If NULL, automatic binning is used. |
n_bins |
Binning method when |
grouping |
Character vector of variable names to match exactly (without coarsening). These are typically categorical variables. |
keep |
Which units to return: |
left_id |
Name of ID column in left (default: |
right_id |
Name of ID column in right (default: |
CEM algorithm:
Coarsen each numeric variable using cut with
either user-specified breakpoints or automatic binning (Sturges, FD, or
Scott rule)
Categorical variables in grouping are kept as-is
Create strata by concatenating all coarsened values
Drop strata with 0 left or 0 right units
Compute CEM weights: left units get weight 1, right units get
weight n_left_in_stratum / n_right_in_stratum so that the total
weight of right units in each stratum equals the number of left units
An S3 object of class c("cem_result", "couplr_result")
containing:
Tibble with columns id, side, stratum,
weight
Tibble with per-stratum counts
List with n_strata, n_matched_left,
n_matched_right, n_pruned_left, n_pruned_right,
method, vars
set.seed(42) left <- data.frame( id = 1:20, age = rnorm(20, 40, 10), income = rnorm(20, 50000, 10000) ) right <- data.frame( id = 21:60, age = rnorm(40, 42, 10), income = rnorm(40, 52000, 10000) ) result <- cem_match(left, right, vars = c("age", "income")) print(result)set.seed(42) left <- data.frame( id = 1:20, age = rnorm(20, 40, 10), income = rnorm(20, 50000, 10000) ) right <- data.frame( id = 21:60, age = rnorm(40, 42, 10), income = rnorm(40, 52000, 10000) ) result <- cem_match(left, right, vars = c("age", "income")) print(result)
Precomputes a distance matrix between left and right datasets, allowing it to be reused across multiple matching operations with different constraints. This is particularly useful when exploring different matching parameters (max_distance, calipers, methods) without recomputing distances.
compute_distances( left, right, vars, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, left_id = "id", right_id = "id", block_id = NULL )compute_distances( left, right, vars, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, left_id = "id", right_id = "id", block_id = NULL )
left |
Left dataset (data frame) |
right |
Right dataset (data frame) |
vars |
Character vector of variable names to use for distance computation |
distance |
Distance metric (default: "euclidean") |
weights |
Optional numeric vector of variable weights |
scale |
Scaling method: FALSE, "standardize", "range", or "robust" |
auto_scale |
Apply automatic preprocessing (default: FALSE) |
left_id |
Name of ID column in left (default: "id") |
right_id |
Name of ID column in right (default: "id") |
block_id |
Optional block ID column name for blocked matching |
This function computes distances once and stores them in a reusable object.
The resulting distance_object can be passed to match_couples() or
greedy_couples() instead of providing datasets and variables.
Benefits:
Performance: Avoid recomputing distances when trying different constraints
Exploration: Quickly test max_distance, calipers, or methods
Consistency: Ensures same distances used across comparisons
Memory efficient: Can use sparse matrices when many pairs are forbidden
The distance_object stores the original datasets, allowing downstream
functions like join_matched() to work seamlessly.
An S3 object of class "distance_object" containing:
cost_matrix: Numeric matrix of distances
left_ids: Character vector of left IDs
right_ids: Character vector of right IDs
block_id: Block ID column name (if specified)
metadata: List with computation details (vars, distance, scale, etc.)
original_left: Original left dataset (for later joining)
original_right: Original right dataset (for later joining)
# Compute distances once left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000) right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000) dist_obj <- compute_distances( left, right, vars = c("age", "income"), scale = "standardize" ) # Reuse for different matching strategies result1 <- match_couples(dist_obj, max_distance = 0.5) result2 <- match_couples(dist_obj, max_distance = 1.0) result3 <- greedy_couples(dist_obj, strategy = "sorted") # All use the same precomputed distances# Compute distances once left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000) right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000) dist_obj <- compute_distances( left, right, vars = c("age", "income"), scale = "standardize" ) # Reuse for different matching strategies result1 <- match_couples(dist_obj, max_distance = 0.5) result2 <- match_couples(dist_obj, max_distance = 1.0) result3 <- greedy_couples(dist_obj, strategy = "sorted") # All use the same precomputed distances
Comprehensive diagnostics for a distance matrix with actionable suggestions.
diagnose_distance_matrix( cost_matrix, left = NULL, right = NULL, vars = NULL, warn = TRUE )diagnose_distance_matrix( cost_matrix, left = NULL, right = NULL, vars = NULL, warn = TRUE )
cost_matrix |
Numeric matrix of distances |
left |
Left dataset (for variable checking) |
right |
Right dataset (for variable checking) |
vars |
Variables used for matching |
warn |
If TRUE, issue warnings |
List with diagnostic results and suggestions
Small example datasets for demonstrating couplr functionality across different assignment problem types: square, rectangular, sparse, and binary.
example_costsexample_costs
A list containing four example cost matrices:
A 3x3 cost matrix with costs ranging from 2-7. Optimal assignment: row 1 -> col 2 (cost 2), row 2 -> col 1 (cost 3), row 3 -> col 3 (cost 4). Total optimal cost: 9.
A 3x5 rectangular cost matrix demonstrating assignment when rows < columns. Each of 3 rows is assigned to one of 5 columns; 2 columns remain unassigned. Costs range 1-6.
A 3x3 matrix with NA values indicating forbidden assignments. Use this to test algorithms' handling of constraints. Position (1,3), (2,2), and (3,1) are forbidden.
A 3x3 matrix with binary (0/1) costs, suitable for testing the HK01 algorithm. Diagonal entries are 0 (preferred), off-diagonal entries are 1 (penalty).
These matrices are designed to test different aspects of LAP solvers:
simple_3x3: Basic functionality test. Any correct solver should find total cost = 9.
rectangular_3x5: Tests handling of non-square problems. The optimal solution assigns all 3 rows with minimum total cost.
sparse_with_na: Tests constraint handling. Algorithms must avoid NA positions while finding an optimal assignment among valid entries.
binary_costs: Tests specialized binary cost algorithms. The optimal assignment uses all diagonal entries (total cost = 0).
# Simple 3x3 assignment result <- lap_solve(example_costs$simple_3x3) print(result) # Optimal: sources 1,2,3 -> targets 2,1,3 with cost 9 # Rectangular problem (3 sources, 5 targets) result <- lap_solve(example_costs$rectangular_3x5) print(result) # All 3 sources assigned; 2 targets unassigned # Sparse problem with forbidden assignments result <- lap_solve(example_costs$sparse_with_na) print(result) # Avoids NA positions # Binary costs - test HK01 algorithm result <- lap_solve(example_costs$binary_costs, method = "hk01") print(result) # Finds diagonal assignment (cost = 0)# Simple 3x3 assignment result <- lap_solve(example_costs$simple_3x3) print(result) # Optimal: sources 1,2,3 -> targets 2,1,3 with cost 9 # Rectangular problem (3 sources, 5 targets) result <- lap_solve(example_costs$rectangular_3x5) print(result) # All 3 sources assigned; 2 targets unassigned # Sparse problem with forbidden assignments result <- lap_solve(example_costs$sparse_with_na) print(result) # Avoids NA positions # Binary costs - test HK01 algorithm result <- lap_solve(example_costs$binary_costs, method = "hk01") print(result) # Finds diagonal assignment (cost = 0)
A tidy data frame representation of assignment problems, suitable for use with grouped workflows and batch solving. Contains two independent 3x3 assignment problems in long format.
example_dfexample_df
A tibble with 18 rows and 4 columns:
Simulation/problem identifier. Integer with values 1 or 2,
distinguishing two independent assignment problems. Use with
group_by(sim) for grouped solving.
Source node index. Integer 1-3 representing the row (source) in each 3x3 cost matrix.
Target node index. Integer 1-3 representing the column (target) in each 3x3 cost matrix.
Cost of assigning source to target. Numeric values ranging from 1-7. Each source-target pair has exactly one cost entry.
This dataset demonstrates couplr's data frame interface for LAP solving. The long format (one row per source-target pair) is converted internally to a cost matrix for solving.
Simulation 1: Costs from example_costs$simple_3x3
Optimal assignment: (1->2, 2->1, 3->3)
Total cost: 9
Simulation 2: Different cost structure
Optimal assignment: (1->1, 2->3, 3->3) or equivalent
Total cost: 4
lap_solve, lap_solve_batch,
example_costs
library(dplyr) # Solve both problems with grouped workflow example_df |> group_by(sim) |> lap_solve(source, target, cost) # Batch solving for efficiency example_df |> group_by(sim) |> lap_solve_batch(source, target, cost) # Inspect the data structure example_df |> group_by(sim) |> summarise( n_pairs = n(), min_cost = min(cost), max_cost = max(cost) )library(dplyr) # Solve both problems with grouped workflow example_df |> group_by(sim) |> lap_solve(source, target, cost) # Batch solving for efficiency example_df |> group_by(sim) |> lap_solve_batch(source, target, cost) # Inspect the data structure example_df |> group_by(sim) |> summarise( n_pairs = n(), min_cost = min(cost), max_cost = max(cost) )
Assigns every unit (left and right) to a matched group with variable ratios (1:k or k:1). Unlike 1:1 matching, full matching does not discard units, producing matched groups where each group contains at least one left and one right unit.
full_match( left, right, vars, distance = "euclidean", min_controls = 1, max_controls = Inf, caliper = NULL, caliper_sd = NULL, weights = NULL, scale = FALSE, auto_scale = FALSE, sigma = NULL, left_id = "id", right_id = "id", method = "optimal" )full_match( left, right, vars, distance = "euclidean", min_controls = 1, max_controls = Inf, caliper = NULL, caliper_sd = NULL, weights = NULL, scale = FALSE, auto_scale = FALSE, sigma = NULL, left_id = "id", right_id = "id", method = "optimal" )
left |
Data frame of left (treated) units |
right |
Data frame of right (control) units |
vars |
Character vector of variable names to match on |
distance |
Distance metric: |
min_controls |
Minimum number of right units per group (default: 1) |
max_controls |
Maximum number of right units per group (default: Inf) |
caliper |
Maximum allowable distance for a match. Units with no eligible partner within the caliper are left unmatched. |
caliper_sd |
If not NULL, caliper is expressed in standard deviations of the pooled distance distribution rather than absolute units. |
weights |
Named numeric vector of variable weights |
scale |
Scaling method: |
auto_scale |
If TRUE, automatically preprocess and scale variables |
sigma |
Optional covariance matrix for Mahalanobis distance |
left_id |
Name of ID column in left (default: |
right_id |
Name of ID column in right (default: |
method |
Matching algorithm: |
Full matching creates matched groups of variable size. Two algorithms are available:
Optimal (method = "optimal", default): Solves a min-cost
max-flow problem that minimizes total distance across all group assignments
simultaneously. Each left unit becomes a group center absorbing 1 to
max_controls right units, with the globally optimal assignment found
via Dijkstra's algorithm with Johnson potentials. When n_left > n_right,
roles are transposed automatically.
Greedy (method = "greedy"): A fast two-pass heuristic:
Each left unit picks its nearest eligible right unit
Remaining right units are assigned to their nearest already-matched
left unit, respecting max_controls
This is faster but does not guarantee globally optimal results.
Weights are computed so that within each group, the total weight of right units equals the total weight of left units (which is 1). For a group with 1 left and k right units, each right unit receives weight 1/k.
An S3 object of class c("full_matching_result", "couplr_result")
containing:
Tibble with columns group_id, id, side
("left"/"right"), and weight
List with n_groups, n_left, n_right,
n_unmatched_left, n_unmatched_right, method,
vars
List of unmatched left and right IDs (if caliper excludes units)
set.seed(42) left <- data.frame(id = 1:5, age = c(25, 35, 45, 55, 65)) right <- data.frame(id = 6:20, age = runif(15, 20, 70)) result <- full_match(left, right, vars = "age") print(result)set.seed(42) left <- data.frame(id = 1:5, age = c(25, 35, 45, 55, 65)) right <- data.frame(id = 6:20, age = runif(15, 20, 70)) result <- full_match(left, right, vars = "age") print(result)
Extract method used from assignment result
get_method_used(x)get_method_used(x)
x |
An assignment result object |
Character string indicating method used
Extract total cost from assignment result
get_total_cost(x)get_total_cost(x)
x |
An assignment result object |
Numeric total cost
Performs fast one-to-one matching using greedy strategies. Does not guarantee
optimal total distance but is much faster than match_couples() for large
datasets. Supports blocking, distance constraints, and various distance metrics.
greedy_couples( left, right = NULL, vars = NULL, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, max_distance = Inf, calipers = NULL, block_id = NULL, ignore_blocks = FALSE, require_full_matching = FALSE, strategy = c("row_best", "sorted", "pq"), return_unmatched = TRUE, return_diagnostics = FALSE, parallel = FALSE, replace = FALSE, ratio = 1L, check_costs = TRUE, sigma = NULL )greedy_couples( left, right = NULL, vars = NULL, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, max_distance = Inf, calipers = NULL, block_id = NULL, ignore_blocks = FALSE, require_full_matching = FALSE, strategy = c("row_best", "sorted", "pq"), return_unmatched = TRUE, return_diagnostics = FALSE, parallel = FALSE, replace = FALSE, ratio = 1L, check_costs = TRUE, sigma = NULL )
left |
Data frame of "left" units (e.g., treated, cases) |
right |
Data frame of "right" units (e.g., control, controls) |
vars |
Variable names to use for distance computation |
distance |
Distance metric: "euclidean", "manhattan", "mahalanobis", or a custom function |
weights |
Optional named vector of variable weights |
scale |
Scaling method: FALSE (none), "standardize", "range", or "robust" |
auto_scale |
If TRUE, automatically check variable health and select scaling method (default: FALSE) |
max_distance |
Maximum allowed distance (pairs exceeding this are forbidden) |
calipers |
Named list of per-variable maximum absolute differences |
block_id |
Column name containing block IDs (for stratified matching) |
ignore_blocks |
If TRUE, ignore block_id even if present |
require_full_matching |
If TRUE, error if any units remain unmatched |
strategy |
Greedy strategy:
|
return_unmatched |
Include unmatched units in output |
return_diagnostics |
Include detailed diagnostics in output |
parallel |
Enable parallel processing for blocked matching. Requires 'future' and 'future.apply' packages. Can be:
|
replace |
If TRUE, allow matching with replacement (same right unit can be matched to multiple left units). Default: FALSE. |
ratio |
Integer, number of right units to match per left unit. Default: 1 (one-to-one matching). For k:1 matching, set ratio = k. |
check_costs |
If TRUE, check distance distribution for potential problems and provide helpful warnings before matching (default: TRUE) |
sigma |
Optional covariance matrix for Mahalanobis distance. If NULL
(default), the pooled sample covariance is used. Only relevant when
|
Greedy strategies do not guarantee optimal total distance but are much faster:
"row_best": O(n*m) time, simple and often produces good results
"sorted": O(nmlog(n*m)) time, better quality but slower
"pq": O(nmlog(n*m)) time, memory-efficient for large problems
Use greedy_couples when:
Dataset is very large (> 10,000 x 10,000)
Approximate solution is acceptable
Speed is more important than optimality
A list with class "matching_result" (same structure as match_couples)
# Basic greedy matching left <- data.frame(id = 1:100, x = rnorm(100)) right <- data.frame(id = 101:200, x = rnorm(100)) result <- greedy_couples(left, right, vars = "x") # Compare to optimal result_opt <- match_couples(left, right, vars = "x") result_greedy <- greedy_couples(left, right, vars = "x") result_greedy$info$total_distance / result_opt$info$total_distance # Quality ratio# Basic greedy matching left <- data.frame(id = 1:100, x = rnorm(100)) right <- data.frame(id = 101:200, x = rnorm(100)) result <- greedy_couples(left, right, vars = "x") # Compare to optimal result_opt <- match_couples(left, right, vars = "x") result_greedy <- greedy_couples(left, right, vars = "x") result_greedy$info$total_distance / result_opt$info$total_distance # Quality ratio
A comprehensive example dataset for demonstrating couplr functionality across vignettes. Contains hospital staff scheduling data with nurses, shifts, costs, and preference scores suitable for assignment problems, as well as nurse characteristics for matching workflows.
hospital_staffhospital_staff
A list containing eight related datasets:
A 10x10 numeric cost matrix for assigning 10 nurses
to 10 shifts. Values range from approximately 1-15, where lower values
indicate better fit (less overtime, matches skills, respects preferences).
Use with lap_solve() for basic assignment.
A 10x10 numeric preference matrix on a 0-10 scale,
where higher values indicate stronger nurse preference for a shift.
Use with lap_solve(..., maximize = TRUE) to optimize preferences
rather than minimize costs.
A tibble with 100 rows (10 nurses x 10 shifts) in long format for data frame workflows:
Integer 1-10. Unique identifier for each nurse.
Integer 1-10. Unique identifier for each shift.
Numeric. Assignment cost (same values as basic_costs).
Numeric 0-10. Nurse preference score.
Integer 0/1. Binary indicator: 1 if nurse skills match shift requirements, 0 otherwise.
A tibble with 10 rows describing nurse characteristics:
Integer 1-10. Links to schedule_df and basic_costs rows.
Numeric 1-20. Years of nursing experience.
Character. Primary department: "ICU", "ER", "General", or "Pediatrics".
Character. Preferred shift type: "day", "evening", or "night".
Integer 1-3. Certification level where 3 is highest (e.g., 1=RN, 2=BSN, 3=MSN).
A tibble with 10 rows describing shift requirements:
Integer 1-10. Links to schedule_df and basic_costs cols.
Character. Department needing coverage.
Character. Shift type: "day", "evening", or "night".
Numeric. Minimum years of experience required.
Integer 1-3. Minimum certification level.
A tibble for batch solving with 500 rows (5 days x 10 nurses x 10 shifts):
Character. Day of week: "Mon", "Tue", "Wed", "Thu", "Fri".
Integer 1-10. Nurse identifier.
Integer 1-10. Shift identifier.
Numeric. Daily assignment cost (varies by day).
Numeric 0-10. Daily preference score.
Use with group_by(day) for solving each day's schedule.
A tibble with 200 nurses for matching examples, representing a treatment group (e.g., full-time nurses):
Integer 1-200. Unique identifier.
Numeric 22-65. Nurse age in years.
Numeric 0-40. Years of nursing experience.
Numeric 25-75. Hourly wage in dollars.
Character. Primary department assignment.
Integer 1-3. Certification level.
Logical. TRUE for full-time status.
A tibble with 300 potential control nurses (e.g., part-time or registry nurses) for matching. Same structure as nurses_extended. Designed to have systematic differences from nurses_extended (older, less experience on average) to demonstrate matching's ability to create comparable groups.
This dataset is used throughout the couplr documentation to provide a consistent, realistic example that evolves in complexity. It supports three use cases: (1) basic LAP solving with cost matrices, (2) batch solving across multiple days, and (3) matching workflows comparing nurse groups.
The dataset is designed to demonstrate progressively complex scenarios:
Basic LAP (vignette("getting-started")):
basic_costs: Simple 10x10 assignment
preferences: Maximization problem
schedule_df: Data frame input, grouped workflows
weekly_df: Batch solving across days
Algorithm comparison (vignette("algorithms")):
Use basic_costs to compare algorithm behavior
Modify with NA values for sparse scenarios
Matching workflows (vignette("matching-workflows")):
nurses_extended: Treatment group (full-time nurses)
controls_extended: Control pool (part-time/registry nurses)
Match on age, experience, department for causal analysis
lap_solve for basic assignment solving,
lap_solve_batch for batch solving,
match_couples for matching workflows,
vignette("getting-started") for introductory tutorial
# Basic assignment: assign nurses to shifts minimizing cost lap_solve(hospital_staff$basic_costs) # Maximize preferences instead lap_solve(hospital_staff$preferences, maximize = TRUE) # Data frame workflow library(dplyr) hospital_staff$schedule_df |> lap_solve(nurse_id, shift_id, cost) # Batch solve weekly schedule hospital_staff$weekly_df |> group_by(day) |> lap_solve(nurse_id, shift_id, cost) # Matching workflow: match full-time to part-time nurses match_couples( left = hospital_staff$nurses_extended, right = hospital_staff$controls_extended, vars = c("age", "experience_years", "certification_level"), auto_scale = TRUE )# Basic assignment: assign nurses to shifts minimizing cost lap_solve(hospital_staff$basic_costs) # Maximize preferences instead lap_solve(hospital_staff$preferences, maximize = TRUE) # Data frame workflow library(dplyr) hospital_staff$schedule_df |> lap_solve(nurse_id, shift_id, cost) # Batch solve weekly schedule hospital_staff$weekly_df |> group_by(day) |> lap_solve(nurse_id, shift_id, cost) # Matching workflow: match full-time to part-time nurses match_couples( left = hospital_staff$nurses_extended, right = hospital_staff$controls_extended, vars = c("age", "experience_years", "certification_level"), auto_scale = TRUE )
Check if Object is a Distance Object
is_distance_object(x)is_distance_object(x)
x |
Object to check |
Logical: TRUE if x is a distance_object
left <- data.frame(id = 1:3, x = c(1, 2, 3)) right <- data.frame(id = 4:6, x = c(1.1, 2.1, 3.1)) dist_obj <- compute_distances(left, right, vars = "x") is_distance_object(dist_obj) # TRUE is_distance_object(list()) # FALSEleft <- data.frame(id = 1:3, x = c(1, 2, 3)) right <- data.frame(id = 4:6, x = c(1.1, 2.1, 3.1)) dist_obj <- compute_distances(left, right, vars = "x") is_distance_object(dist_obj) # TRUE is_distance_object(list()) # FALSE
Check if object is a batch assignment result
is_lap_solve_batch_result(x)is_lap_solve_batch_result(x)
x |
Object to test |
Logical indicating if x is a batch assignment result
Check if object is a k-best assignment result
is_lap_solve_kbest_result(x)is_lap_solve_kbest_result(x)
x |
Object to test |
Logical indicating if x is a k-best assignment result
Check if object is an assignment result
is_lap_solve_result(x)is_lap_solve_result(x)
x |
Object to test |
Logical indicating if x is an assignment result
Creates an analysis-ready dataset by joining matched pairs with variables from the original left and right datasets. This eliminates the need for manual joins and provides a convenient format for downstream analysis.
join_matched(result, ...) ## S3 method for class 'matching_result' join_matched( result, left, right, left_vars = NULL, right_vars = NULL, left_id = "id", right_id = "id", suffix = c("_left", "_right"), include_distance = TRUE, include_pair_id = TRUE, include_block_id = TRUE, ... ) ## S3 method for class 'full_matching_result' join_matched(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'cem_result' join_matched(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'subclass_result' join_matched(result, data = NULL, ...)join_matched(result, ...) ## S3 method for class 'matching_result' join_matched( result, left, right, left_vars = NULL, right_vars = NULL, left_id = "id", right_id = "id", suffix = c("_left", "_right"), include_distance = TRUE, include_pair_id = TRUE, include_block_id = TRUE, ... ) ## S3 method for class 'full_matching_result' join_matched(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'cem_result' join_matched(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'subclass_result' join_matched(result, data = NULL, ...)
result |
A result object from |
... |
Additional arguments passed to methods |
left |
The original left dataset |
right |
The original right dataset |
left_vars |
Character vector of variable names to include from left. If NULL (default), includes all variables except the ID column. |
right_vars |
Character vector of variable names to include from right. If NULL (default), includes all variables except the ID column. |
left_id |
Name of the ID column in left dataset (default: "id") |
right_id |
Name of the ID column in right dataset (default: "id") |
suffix |
Character vector of length 2 specifying suffixes for left and right variables (default: c("_left", "_right")) |
include_distance |
Include the matching distance in output (default: TRUE) |
include_pair_id |
Include pair_id column (default: TRUE) |
include_block_id |
Include block_id if blocking was used (default: TRUE) |
data |
Data frame used for subclassification |
This function simplifies the common workflow of joining matched pairs
with original data. Instead of manually merging result$pairs with left
and right datasets, join_matched() handles the joins automatically
and applies consistent naming conventions.
When variables appear in both left and right datasets, suffixes are appended to distinguish them (e.g., "age_left" and "age_right"). This makes it easy to compute differences or use both values in models.
A tibble with one row per matched pair, containing:
pair_id: Sequential pair identifier (if include_pair_id = TRUE)
left_id: ID from left dataset
right_id: ID from right dataset
distance: Matching distance (if include_distance = TRUE)
block_id: Block identifier (if blocking used and include_block_id = TRUE)
Variables from left dataset (with left suffix)
Variables from right dataset (with right suffix)
# Basic usage left <- data.frame( id = 1:5, treatment = 1, age = c(25, 30, 35, 40, 45), income = c(45000, 52000, 48000, 61000, 55000) ) right <- data.frame( id = 6:10, treatment = 0, age = c(24, 29, 36, 41, 44), income = c(46000, 51500, 47500, 60000, 54000) ) result <- match_couples(left, right, vars = c("age", "income")) matched_data <- join_matched(result, left, right) head(matched_data) # Specify which variables to include matched_data <- join_matched( result, left, right, left_vars = c("treatment", "age", "income"), right_vars = c("age", "income"), suffix = c("_treated", "_control") ) # Without distance or pair_id matched_data <- join_matched( result, left, right, include_distance = FALSE, include_pair_id = FALSE )# Basic usage left <- data.frame( id = 1:5, treatment = 1, age = c(25, 30, 35, 40, 45), income = c(45000, 52000, 48000, 61000, 55000) ) right <- data.frame( id = 6:10, treatment = 0, age = c(24, 29, 36, 41, 44), income = c(46000, 51500, 47500, 60000, 54000) ) result <- match_couples(left, right, vars = c("age", "income")) matched_data <- join_matched(result, left, right) head(matched_data) # Specify which variables to include matched_data <- join_matched( result, left, right, left_vars = c("treatment", "age", "income"), right_vars = c("age", "income"), suffix = c("_treated", "_control") ) # Without distance or pair_id matched_data <- join_matched( result, left, right, include_distance = FALSE, include_pair_id = FALSE )
Produce an interactive bipartite-graph animation showing how a chosen linear-assignment algorithm transforms the matching over time. The result is an htmlwidget suitable for use in R Markdown, Quarto, pkgdown vignettes, Shiny apps, or standalone HTML output.
lap_animate( cost, method = "hungarian", maximize = FALSE, width = NULL, height = NULL, elementId = NULL, ... )lap_animate( cost, method = "hungarian", maximize = FALSE, width = NULL, height = NULL, elementId = NULL, ... )
cost |
Numeric cost matrix. Rows = workers/sources, columns = jobs/targets.
|
method |
Character; the algorithm to animate. Must match one of the
methods registered for animation (see |
maximize |
Logical; if |
width, height
|
Optional explicit widget dimensions (pixels or CSS units). |
elementId |
Optional DOM id for the widget container. |
... |
Algorithm-specific extra arguments forwarded to the trace
function (e.g. |
This is a teaching interface. It runs a slower R reference implementation
that emits a state trace at every step, then plays it back in the browser.
For production solving, use assignment() or lap_solve() which call the
fast C++ backends.
An htmlwidget object.
Animation support is added incrementally. Call animated_methods() to see
which method strings currently have a registered trace.
assignment() for production solving, lap_solve() for the tidy
interface.
## Not run: cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3, byrow = TRUE) lap_animate(cost, method = "hungarian") ## End(Not run)## Not run: cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3, byrow = TRUE) lap_animate(cost, method = "hungarian") ## End(Not run)
Provides a tidy interface for solving the linear assignment problem using 'Hungarian' or 'Jonker-Volgenant' algorithms. Supports rectangular matrices, NA/Inf masking, and data frame inputs.
lap_solve( x, source = NULL, target = NULL, cost = NULL, maximize = FALSE, method = "auto", forbidden = NA )lap_solve( x, source = NULL, target = NULL, cost = NULL, maximize = FALSE, method = "auto", forbidden = NA )
x |
Cost matrix, data frame, or tibble. If a data frame/tibble,
must include columns specified by |
source |
Column name for source/row indices (if |
target |
Column name for target/column indices (if |
cost |
Column name for costs (if |
maximize |
Logical; if TRUE, maximizes total cost instead of minimizing (default: FALSE) |
method |
Algorithm to use. One of:
|
forbidden |
Value to mark forbidden assignments (default: NA). Can also use Inf. |
A tibble with columns:
source: row/source indices
target: column/target indices
cost: cost of each assignment
total_cost: total cost (attribute)
# Matrix input cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3) lap_solve(cost) # Data frame input library(dplyr) df <- tibble( source = rep(1:3, each = 3), target = rep(1:3, times = 3), cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4) ) lap_solve(df, source, target, cost) # With NA masking (forbidden assignments) cost[1, 3] <- NA lap_solve(cost) # Grouped data frames df <- tibble( sim = rep(1:2, each = 9), source = rep(1:3, times = 6), target = rep(1:3, each = 3, times = 2), cost = runif(18, 1, 10) ) df |> group_by(sim) |> lap_solve(source, target, cost)# Matrix input cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3) lap_solve(cost) # Data frame input library(dplyr) df <- tibble( source = rep(1:3, each = 3), target = rep(1:3, times = 3), cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4) ) lap_solve(df, source, target, cost) # With NA masking (forbidden assignments) cost[1, 3] <- NA lap_solve(cost) # Grouped data frames df <- tibble( sim = rep(1:2, each = 9), source = rep(1:3, times = 6), target = rep(1:3, each = 3, times = 2), cost = runif(18, 1, 10) ) df |> group_by(sim) |> lap_solve(source, target, cost)
Solve many independent assignment problems at once. Supports lists of matrices,
3D arrays, or grouped data frames. Optional parallel execution via n_threads.
lap_solve_batch( x, source = NULL, target = NULL, cost = NULL, maximize = FALSE, method = "auto", n_threads = 1, forbidden = NA )lap_solve_batch( x, source = NULL, target = NULL, cost = NULL, maximize = FALSE, method = "auto", n_threads = 1, forbidden = NA )
x |
One of: List of cost matrices, 3D array, or grouped data frame |
source |
Column name for source indices (if |
target |
Column name for target indices (if |
cost |
Column name for costs (if |
maximize |
Logical; if TRUE, maximizes total cost (default: FALSE) |
method |
Algorithm to use (default: "auto"). See |
n_threads |
Number of threads for parallel execution (default: 1). Set to NULL to use all available cores. |
forbidden |
Value to mark forbidden assignments (default: NA) |
A tibble with columns:
problem_id: identifier for each problem
source: source indices for assignments
target: target indices for assignments
cost: cost of each assignment
total_cost: total cost for each problem
method_used: algorithm used for each problem
# List of matrices costs <- list( matrix(c(1, 2, 3, 4), 2, 2), matrix(c(5, 6, 7, 8), 2, 2) ) lap_solve_batch(costs) # 3D array arr <- array(runif(2 * 2 * 10), dim = c(2, 2, 10)) lap_solve_batch(arr) # Grouped data frame library(dplyr) df <- tibble( sim = rep(1:5, each = 9), source = rep(1:3, times = 15), target = rep(1:3, each = 3, times = 5), cost = runif(45, 1, 10) ) df |> group_by(sim) |> lap_solve_batch(source, target, cost) # Parallel execution (requires n_threads > 1) lap_solve_batch(costs, n_threads = 2)# List of matrices costs <- list( matrix(c(1, 2, 3, 4), 2, 2), matrix(c(5, 6, 7, 8), 2, 2) ) lap_solve_batch(costs) # 3D array arr <- array(runif(2 * 2 * 10), dim = c(2, 2, 10)) lap_solve_batch(arr) # Grouped data frame library(dplyr) df <- tibble( sim = rep(1:5, each = 9), source = rep(1:3, times = 15), target = rep(1:3, each = 3, times = 5), cost = runif(45, 1, 10) ) df |> group_by(sim) |> lap_solve_batch(source, target, cost) # Parallel execution (requires n_threads > 1) lap_solve_batch(costs, n_threads = 2)
Returns the top k optimal (or near-optimal) assignments using 'Murty' algorithm. Useful for exploring alternative optimal solutions or finding robust assignments.
lap_solve_kbest( x, k = 3, source = NULL, target = NULL, cost = NULL, maximize = FALSE, method = "murty", single_method = "jv", forbidden = NA )lap_solve_kbest( x, k = 3, source = NULL, target = NULL, cost = NULL, maximize = FALSE, method = "murty", single_method = "jv", forbidden = NA )
x |
Cost matrix, data frame, or tibble. If a data frame/tibble,
must include columns specified by |
k |
Number of best solutions to return (default: 3) |
source |
Column name for source/row indices (if |
target |
Column name for target/column indices (if |
cost |
Column name for costs (if |
maximize |
Logical; if TRUE, finds k-best maximizing assignments (default: FALSE) |
method |
Algorithm for each sub-problem (default: "murty"). Future versions may support additional methods. |
single_method |
Algorithm used for solving each node in the search tree (default: "jv") |
forbidden |
Value to mark forbidden assignments (default: NA) |
A tibble with columns:
rank: ranking of solutions (1 = best, 2 = second best, etc.)
solution_id: unique identifier for each solution
source: source indices
target: target indices
cost: cost of each edge in the assignment
total_cost: total cost of the complete solution
# Matrix input - find 5 best solutions cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3) lap_solve_kbest(cost, k = 5) # Data frame input library(dplyr) df <- tibble( source = rep(1:3, each = 3), target = rep(1:3, times = 3), cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4) ) lap_solve_kbest(df, k = 3, source, target, cost) # With maximization lap_solve_kbest(cost, k = 3, maximize = TRUE)# Matrix input - find 5 best solutions cost <- matrix(c(4, 2, 5, 3, 3, 6, 7, 5, 4), nrow = 3) lap_solve_kbest(cost, k = 5) # Data frame input library(dplyr) df <- tibble( source = rep(1:3, each = 3), target = rep(1:3, times = 3), cost = c(4, 2, 5, 3, 3, 6, 7, 5, 4) ) lap_solve_kbest(df, k = 3, source, target, cost) # With maximization lap_solve_kbest(cost, k = 3, maximize = TRUE)
Solves the linear assignment problem when both sources and targets are ordered points on a line. Uses efficient O(n*m) dynamic programming for rectangular problems and O(n) sorting for square problems.
lap_solve_line_metric(x, y, cost = "L1", maximize = FALSE)lap_solve_line_metric(x, y, cost = "L1", maximize = FALSE)
x |
Numeric vector of source positions (will be sorted internally) |
y |
Numeric vector of target positions (will be sorted internally) |
cost |
Cost function for distance. Either:
|
maximize |
Logical; if TRUE, maximizes total cost instead of minimizing (default: FALSE) |
This is a specialized solver that exploits the structure of 1-dimensional assignment problems where costs depend only on the distance between points on a line. It is much faster than general LAP solvers for this special case.
The algorithm works as follows:
Square case (n == m):
Both vectors are sorted and matched in order: x[1] -> y[1], x[2] -> y[2], etc.
This is optimal for any metric cost function on a line.
Rectangular case (n < m): Uses dynamic programming to find the optimal assignment that matches all n sources to a subset of the m targets, minimizing total distance. The DP recurrence is:
dp[i][j] = min(dp[i][j-1], dp[i-1][j-1] + cost(x[i], y[j]))
This finds the minimum cost to match the first i sources to the first j targets.
Complexity:
Time: O(n*m) for rectangular, O(n log n) for square
Space: O(n*m) for DP table
A list with components:
match: Integer vector of length n with 1-based column indices
total_cost: Total cost of the assignment
# Square case: equal number of sources and targets x <- c(1.5, 3.2, 5.1) y <- c(2.0, 3.0, 5.5) result <- lap_solve_line_metric(x, y, cost = "L1") print(result) # Rectangular case: more targets than sources x <- c(1.0, 3.0, 5.0) y <- c(0.5, 2.0, 3.5, 4.5, 6.0) result <- lap_solve_line_metric(x, y, cost = "L2") print(result) # With unsorted inputs (will be sorted internally) x <- c(5.0, 1.0, 3.0) y <- c(4.5, 0.5, 6.0, 2.0, 3.5) result <- lap_solve_line_metric(x, y, cost = "L1") print(result)# Square case: equal number of sources and targets x <- c(1.5, 3.2, 5.1) y <- c(2.0, 3.0, 5.5) result <- lap_solve_line_metric(x, y, cost = "L1") print(result) # Rectangular case: more targets than sources x <- c(1.0, 3.0, 5.0) y <- c(0.5, 2.0, 3.5, 4.5, 6.0) result <- lap_solve_line_metric(x, y, cost = "L2") print(result) # With unsorted inputs (will be sorted internally) x <- c(5.0, 1.0, 3.0) y <- c(4.5, 0.5, 6.0, 2.0, 3.5) result <- lap_solve_line_metric(x, y, cost = "L1") print(result)
Performs optimal one-to-one matching between two datasets using linear assignment problem (LAP) solvers. Supports blocking, distance constraints, and various distance metrics.
match_couples( left, right = NULL, vars = NULL, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, max_distance = Inf, calipers = NULL, block_id = NULL, ignore_blocks = FALSE, require_full_matching = FALSE, method = "auto", return_unmatched = TRUE, return_diagnostics = FALSE, parallel = FALSE, replace = FALSE, ratio = 1L, check_costs = TRUE, sigma = NULL )match_couples( left, right = NULL, vars = NULL, distance = "euclidean", weights = NULL, scale = FALSE, auto_scale = FALSE, max_distance = Inf, calipers = NULL, block_id = NULL, ignore_blocks = FALSE, require_full_matching = FALSE, method = "auto", return_unmatched = TRUE, return_diagnostics = FALSE, parallel = FALSE, replace = FALSE, ratio = 1L, check_costs = TRUE, sigma = NULL )
left |
Data frame of "left" units (e.g., treated, cases) |
right |
Data frame of "right" units (e.g., control, controls) |
vars |
Variable names to use for distance computation |
distance |
Distance metric: "euclidean", "manhattan", "mahalanobis", or a custom function |
weights |
Optional named vector of variable weights |
scale |
Scaling method: FALSE (none), "standardize", "range", or "robust" |
auto_scale |
If TRUE, automatically check variable health and select scaling method (default: FALSE) |
max_distance |
Maximum allowed distance (pairs exceeding this are forbidden) |
calipers |
Named list of per-variable maximum absolute differences |
block_id |
Column name containing block IDs (for stratified matching) |
ignore_blocks |
If TRUE, ignore block_id even if present |
require_full_matching |
If TRUE, error if any units remain unmatched |
method |
LAP solver: "auto", "hungarian", "jv", "gabow_tarjan", etc. |
return_unmatched |
Include unmatched units in output |
return_diagnostics |
Include detailed diagnostics in output |
parallel |
Enable parallel processing for blocked matching. Requires 'future' and 'future.apply' packages. Can be:
|
replace |
If TRUE, allow matching with replacement (same right unit can be matched to multiple left units). Default: FALSE. |
ratio |
Integer, number of right units to match per left unit. Default: 1 (one-to-one matching). For k:1 matching, set ratio = k. |
check_costs |
If TRUE, check distance distribution for potential problems and provide helpful warnings before matching (default: TRUE) |
sigma |
Optional covariance matrix for Mahalanobis distance. If NULL
(default), the pooled sample covariance is used. Only relevant when
|
This function finds the matching that minimizes total distance among all
feasible matchings, subject to constraints. Use greedy_couples() for
faster approximate matching on large datasets.
A list with class "matching_result" containing:
pairs: Tibble of matched pairs with distances
unmatched: List of unmatched left and right IDs
info: Matching diagnostics and metadata
# Basic matching left <- data.frame(id = 1:5, x = c(1, 2, 3, 4, 5), y = c(2, 4, 6, 8, 10)) right <- data.frame(id = 6:10, x = c(1.1, 2.2, 3.1, 4.2, 5.1), y = c(2.1, 4.1, 6.2, 8.1, 10.1)) result <- match_couples(left, right, vars = c("x", "y")) print(result$pairs) # With constraints result <- match_couples(left, right, vars = c("x", "y"), max_distance = 1, calipers = list(x = 0.5)) # With blocking left$region <- c("A", "A", "B", "B", "B") right$region <- c("A", "A", "B", "B", "B") blocks <- matchmaker(left, right, block_type = "group", block_by = "region") result <- match_couples(blocks$left, blocks$right, vars = c("x", "y"))# Basic matching left <- data.frame(id = 1:5, x = c(1, 2, 3, 4, 5), y = c(2, 4, 6, 8, 10)) right <- data.frame(id = 6:10, x = c(1.1, 2.2, 3.1, 4.2, 5.1), y = c(2.1, 4.1, 6.2, 8.1, 10.1)) result <- match_couples(left, right, vars = c("x", "y")) print(result$pairs) # With constraints result <- match_couples(left, right, vars = c("x", "y"), max_distance = 1, calipers = list(x = 0.5)) # With blocking left$region <- c("A", "A", "B", "B", "B") right$region <- c("A", "A", "B", "B", "B") blocks <- matchmaker(left, right, block_type = "group", block_by = "region") result <- match_couples(blocks$left, blocks$right, vars = c("x", "y"))
A generic function that converts any couplr matching result into a single
analysis-ready data frame with weights, subclass, and
distance columns. This is the couplr equivalent of MatchIt's
match.data().
match_data(result, ...) ## S3 method for class 'matching_result' match_data(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'full_matching_result' match_data(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'cem_result' match_data(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'subclass_result' match_data(result, data = NULL, ...)match_data(result, ...) ## S3 method for class 'matching_result' match_data(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'full_matching_result' match_data(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'cem_result' match_data(result, left, right, left_id = "id", right_id = "id", ...) ## S3 method for class 'subclass_result' match_data(result, data = NULL, ...)
result |
A couplr result object (matching_result, full_matching_result, cem_result, or subclass_result) |
... |
Additional arguments passed to methods |
left |
Data frame of left (treated) units |
right |
Data frame of right (control) units |
left_id |
Name of ID column in left (default: |
right_id |
Name of ID column in right (default: |
data |
Data frame containing all units (for CEM and subclassification, left and right are not always needed separately) |
The output format is compatible with downstream packages like cobalt,
WeightIt, and marginaleffects. The stacked (long) format with
treatment and weights columns is the standard layout expected
by these tools.
A tibble with all original variables plus standardized columns:
Unit identifier
1 for left/treated, 0 for right/control
Matching weights
Matched group/stratum identifier
Matching distance (where applicable)
set.seed(42) left <- data.frame(id = 1:5, age = c(25, 35, 45, 55, 65)) right <- data.frame(id = 6:15, age = runif(10, 20, 70)) result <- match_couples(left, right, vars = "age") md <- match_data(result, left, right) head(md)set.seed(42) left <- data.frame(id = 1:5, age = c(25, 35, 45, 55, 65)) right <- data.frame(id = 6:15, age = runif(10, 20, 70)) result <- match_couples(left, right, vars = "age") md <- match_data(result, left, right) head(md)
Constructs blocks (strata) for matching, using either grouping variables or clustering algorithms. Returns the input data frames with block IDs assigned, along with block summary statistics.
matchmaker( left, right, block_type = c("none", "group", "cluster"), block_by = NULL, block_vars = NULL, block_method = "kmeans", n_blocks = NULL, min_left = 1, min_right = 1, drop_imbalanced = FALSE, imbalance_threshold = Inf, return_dropped = TRUE, ... )matchmaker( left, right, block_type = c("none", "group", "cluster"), block_by = NULL, block_vars = NULL, block_method = "kmeans", n_blocks = NULL, min_left = 1, min_right = 1, drop_imbalanced = FALSE, imbalance_threshold = Inf, return_dropped = TRUE, ... )
left |
Data frame of "left" units (e.g., treated, cases) |
right |
Data frame of "right" units (e.g., control, controls) |
block_type |
Type of blocking to use:
|
block_by |
Variable name(s) for grouping (if block_type = "group") |
block_vars |
Variable names for clustering (if block_type = "cluster") |
block_method |
Clustering method (if block_type = "cluster"):
|
n_blocks |
Target number of blocks (for clustering) |
min_left |
Minimum number of left units per block |
min_right |
Minimum number of right units per block |
drop_imbalanced |
Drop blocks with extreme imbalance |
imbalance_threshold |
Maximum allowed |n_left - n_right| / max(n_left, n_right) |
return_dropped |
Include dropped blocks in output |
... |
Additional arguments passed to clustering function |
This function does NOT perform matching - it only creates the block structure.
Use match_couples() or greedy_couples() to perform matching within blocks.
A list with class "matchmaker_result" containing:
left: Left data frame with block_id column added
right: Right data frame with block_id column added
block_summary: Summary statistics for each block
dropped: Information about dropped blocks (if any)
info: Metadata about blocking process
# Group blocking left <- data.frame(id = 1:10, region = rep(c("A", "B"), each = 5), x = rnorm(10)) right <- data.frame(id = 11:20, region = rep(c("A", "B"), each = 5), x = rnorm(10)) blocks <- matchmaker(left, right, block_type = "group", block_by = "region") print(blocks$block_summary) # Clustering blocks <- matchmaker(left, right, block_type = "cluster", block_vars = "x", n_blocks = 3)# Group blocking left <- data.frame(id = 1:10, region = rep(c("A", "B"), each = 5), x = rnorm(10)) right <- data.frame(id = 11:20, region = rep(c("A", "B"), each = 5), x = rnorm(10)) blocks <- matchmaker(left, right, block_type = "group", block_by = "region") print(blocks$block_summary) # Clustering blocks <- matchmaker(left, right, block_type = "cluster", block_vars = "x", n_blocks = 3)
Computes optimal pixel assignment from A to B and returns the final transported frame (without intermediate animation frames).
pixel_morph( imgA, imgB, n_frames = 16L, mode = c("color_walk", "exact", "recursive"), lap_method = "jv", maximize = FALSE, quantize_bits = 5L, downscale_steps = 0L, alpha = 1, beta = 0, patch_size = 1L, upscale = 1, show = interactive() )pixel_morph( imgA, imgB, n_frames = 16L, mode = c("color_walk", "exact", "recursive"), lap_method = "jv", maximize = FALSE, quantize_bits = 5L, downscale_steps = 0L, alpha = 1, beta = 0, patch_size = 1L, upscale = 1, show = interactive() )
imgA |
Source image (file path or magick image object) |
imgB |
Target image (file path or magick image object) |
n_frames |
Internal parameter for rendering (default: 16) |
mode |
Assignment algorithm: "color_walk" (default), "exact", or "recursive" |
lap_method |
LAP solver method (default: "jv") |
maximize |
Logical, maximize instead of minimize cost (default: FALSE) |
quantize_bits |
Color quantization for "color_walk" mode (default: 5) |
downscale_steps |
Number of 2x reductions before computing assignment (default: 0) |
alpha |
Weight for color distance in cost function (default: 1) |
beta |
Weight for spatial distance in cost function (default: 0) |
patch_size |
Tile size for tiled modes (default: 1) |
upscale |
Post-rendering upscaling factor (default: 1) |
show |
Logical, display result in viewer (default: interactive()) |
This function returns a SHARP, pixel-perfect transport of A's pixels to positions determined by the assignment to B.
Key Points:
Assignment computed using: cost = alpha * color_dist + beta * spatial_dist
B's COLORS influence assignment but DO NOT appear in output
Result has A's colors arranged to match B's layout
No motion blur (unlike intermediate frames in animation)
See pixel_morph_animate for detailed explanation of
assignment vs rendering semantics.
Assignment is guaranteed to be a bijection (permutation) ONLY when:
downscale_steps = 0 (no resolution changes)
mode = "exact" with patch_size = 1
With downscaling or tiled modes, assignment may have:
Overlaps: Multiple source pixels map to same destination (last write wins)
Holes: Some destinations never filled (remain transparent)
If assignment is not a bijection (due to downscaling or tiling), a warning will be issued. The result may contain:
Overlapped pixels (multiple sources -> one destination)
Transparent holes (some destinations unfilled)
For guaranteed pixel-perfect results, use:
pixel_morph(A, B, mode = "exact", downscale_steps = 0)
magick image object of the final transported frame
pixel_morph_animate for animated version
if (requireNamespace("magick", quietly = TRUE)) { imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr") imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr") if (nzchar(imgA) && nzchar(imgB)) { result <- pixel_morph(imgA, imgB, n_frames = 4, show = FALSE) } }if (requireNamespace("magick", quietly = TRUE)) { imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr") imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr") if (nzchar(imgA) && nzchar(imgB)) { result <- pixel_morph(imgA, imgB, n_frames = 4, show = FALSE) } }
Creates an animated morph by computing optimal pixel assignment from image A to image B, then rendering intermediate frames showing the transport.
pixel_morph_animate( imgA, imgB, n_frames = 16L, fps = 10L, format = c("gif", "webp", "mp4"), outfile = NULL, show = interactive(), mode = c("color_walk", "exact", "recursive"), lap_method = "jv", maximize = FALSE, quantize_bits = 5L, downscale_steps = 0L, alpha = 1, beta = 0, patch_size = 1L, upscale = 1 )pixel_morph_animate( imgA, imgB, n_frames = 16L, fps = 10L, format = c("gif", "webp", "mp4"), outfile = NULL, show = interactive(), mode = c("color_walk", "exact", "recursive"), lap_method = "jv", maximize = FALSE, quantize_bits = 5L, downscale_steps = 0L, alpha = 1, beta = 0, patch_size = 1L, upscale = 1 )
imgA |
Source image (file path or magick image object) |
imgB |
Target image (file path or magick image object) |
n_frames |
Integer number of animation frames (default: 16) |
fps |
Frames per second for playback (default: 10) |
format |
Output format: "gif", "webp", or "mp4" |
outfile |
Optional output file path |
show |
Logical, display animation in viewer (default: interactive()) |
mode |
Assignment algorithm: "color_walk" (default), "exact", or "recursive" |
lap_method |
LAP solver method (default: "jv") |
maximize |
Logical, maximize instead of minimize cost (default: FALSE) |
quantize_bits |
Color quantization for "color_walk" mode (default: 5) |
downscale_steps |
Number of 2x reductions before computing assignment (default: 0) |
alpha |
Weight for color distance in cost function (default: 1) |
beta |
Weight for spatial distance in cost function (default: 0) |
patch_size |
Tile size for tiled modes (default: 1) |
upscale |
Post-rendering upscaling factor (default: 1) |
CRITICAL: This function has two separate phases with different semantics:
Phase 1 - Assignment Computation:
The assignment is computed by minimizing:
cost(i,j) = alpha * color_distance(A[i], B[j]) +
beta * spatial_distance(pos_i, pos_j)
This means B's COLORS influence which pixels from A map to which positions.
Phase 2 - Rendering (Transport-Only):
The renderer uses ONLY A's colors:
Intermediate frames: A's pixels move along paths with motion blur
Final frame: A's pixels at their assigned positions (sharp, no blur)
B's colors NEVER appear in the output
Result: You get A's colors rearranged to match B's geometry/layout.
B influences WHERE pixels go (via similarity in cost function)
B does NOT determine WHAT COLORS appear in output
Final image has A's palette arranged to mimic B's structure
For pure spatial rearrangement (ignore B's colors in assignment):
pixel_morph_animate(A, B, alpha = 0, beta = 1)
For color-similarity matching (default):
pixel_morph_animate(A, B, alpha = 1, beta = 0)
For hybrid (color + spatial):
pixel_morph_animate(A, B, alpha = 1, beta = 0.2)
Assignment is guaranteed to be a bijection (permutation) ONLY when:
downscale_steps = 0 (no resolution changes)
mode = "exact" with patch_size = 1
With downscaling or tiled modes, assignment may have:
Overlaps: Multiple source pixels map to same destination (last write wins)
Holes: Some destinations never filled (remain transparent)
A warning is issued if overlaps/holes are detected in the final frame.
Invisibly returns a list with animation object and metadata:
animation |
magick animation object |
width |
Image width in pixels |
height |
Image height in pixels |
assignment |
Integer vector of 1-based assignment indices (R convention) |
n_pixels |
Total number of pixels |
mode |
Mode used for matching |
upscale |
Upscaling factor applied |
if (requireNamespace("magick", quietly = TRUE)) { imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr") imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr") if (nzchar(imgA) && nzchar(imgB)) { outfile <- tempfile(fileext = ".gif") pixel_morph_animate(imgA, imgB, outfile = outfile, n_frames = 4, show = FALSE) } }if (requireNamespace("magick", quietly = TRUE)) { imgA <- system.file("extdata/icons/circleA_40.png", package = "couplr") imgB <- system.file("extdata/icons/circleB_40.png", package = "couplr") if (nzchar(imgA) && nzchar(imgB)) { outfile <- tempfile(fileext = ".gif") pixel_morph_animate(imgA, imgB, outfile = outfile, n_frames = 4, show = FALSE) } }
Produces a Love plot (dot plot) of standardized differences.
## S3 method for class 'balance_diagnostics' plot(x, type = c("love", "histogram", "variance"), threshold = 0.1, ...)## S3 method for class 'balance_diagnostics' plot(x, type = c("love", "histogram", "variance"), threshold = 0.1, ...)
x |
A balance_diagnostics object |
type |
Type of plot: "love" (default), "histogram", or "variance" |
threshold |
Threshold line for standardized differences (default: 0.1) |
... |
Additional arguments passed to plotting functions |
The balance_diagnostics object (invisibly)
Produces a histogram of pairwise distances from a matching result.
## S3 method for class 'matching_result' plot(x, type = c("histogram", "density", "ecdf"), ...)## S3 method for class 'matching_result' plot(x, type = c("histogram", "density", "ecdf"), ...)
x |
A matching_result object |
type |
Type of plot: "histogram" (default), "density", or "ecdf" |
... |
Additional arguments passed to plotting functions |
The matching_result object (invisibly)
Plot method for sensitivity analysis (base graphics)
## S3 method for class 'sensitivity_analysis' plot(x, alpha = 0.05, ...)## S3 method for class 'sensitivity_analysis' plot(x, alpha = 0.05, ...)
x |
A sensitivity_analysis object |
alpha |
Significance level (default: 0.05) |
... |
Additional arguments passed to plot |
The sensitivity_analysis object (invisibly)
Main preprocessing function that orchestrates variable health checks, categorical encoding, and automatic scaling selection.
preprocess_matching_vars( left, right, vars, auto_scale = TRUE, scale_method = "auto", check_health = TRUE, remove_problematic = TRUE, verbose = TRUE )preprocess_matching_vars( left, right, vars, auto_scale = TRUE, scale_method = "auto", check_health = TRUE, remove_problematic = TRUE, verbose = TRUE )
left |
Data frame of left units |
right |
Data frame of right units |
vars |
Character vector of variable names |
auto_scale |
Logical, whether to perform automatic preprocessing (default: TRUE) |
scale_method |
Scaling method: "auto", "standardize", "range", "robust", or FALSE |
check_health |
Logical, whether to check variable health (default: TRUE) |
remove_problematic |
Logical, automatically exclude constant/all-NA variables (default: TRUE) |
verbose |
Logical, whether to print warnings (default: TRUE) |
A list with class "preprocessing_result" containing:
left: Preprocessed left data frame
right: Preprocessed right data frame
vars: Final variable names (after exclusions)
health: Variable health diagnostics
scaling_method: Selected scaling method
excluded_vars: Variables that were excluded
warnings: List of warnings issued
Print Method for Balance Diagnostics
## S3 method for class 'balance_diagnostics' print(x, ...)## S3 method for class 'balance_diagnostics' print(x, ...)
x |
A balance_diagnostics object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print Method for CEM Results
## S3 method for class 'cem_result' print(x, ...)## S3 method for class 'cem_result' print(x, ...)
x |
A cem_result object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print Method for Distance Objects
## S3 method for class 'distance_object' print(x, ...)## S3 method for class 'distance_object' print(x, ...)
x |
A distance_object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print Method for Full Matching Results
## S3 method for class 'full_matching_result' print(x, ...)## S3 method for class 'full_matching_result' print(x, ...)
x |
A full_matching_result object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Prints a summary and the table of results for a batch of assignment
problems solved with lap_solve_batch().
## S3 method for class 'lap_solve_batch_result' print(x, ...)## S3 method for class 'lap_solve_batch_result' print(x, ...)
x |
A |
... |
Additional arguments passed to |
Invisibly returns the input object x.
Print method for k-best assignment results
## S3 method for class 'lap_solve_kbest_result' print(x, ...)## S3 method for class 'lap_solve_kbest_result' print(x, ...)
x |
A |
... |
Additional arguments passed to |
Invisibly returns the input object x.
Nicely prints a lap_solve_result object, including the assignments,
total cost, and method used.
## S3 method for class 'lap_solve_result' print(x, ...)## S3 method for class 'lap_solve_result' print(x, ...)
x |
A |
... |
Additional arguments passed to |
Invisibly returns the input object x.
Print method for matching results
## S3 method for class 'matching_result' print(x, ...)## S3 method for class 'matching_result' print(x, ...)
x |
A matching_result object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print method for matchmaker results
## S3 method for class 'matchmaker_result' print(x, ...)## S3 method for class 'matchmaker_result' print(x, ...)
x |
A matchmaker_result object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print method for preprocessing result
## S3 method for class 'preprocessing_result' print(x, ...)## S3 method for class 'preprocessing_result' print(x, ...)
x |
A preprocessing_result object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print method for sensitivity analysis
## S3 method for class 'sensitivity_analysis' print(x, ...)## S3 method for class 'sensitivity_analysis' print(x, ...)
x |
A sensitivity_analysis object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print Method for Subclassification Results
## S3 method for class 'subclass_result' print(x, ...)## S3 method for class 'subclass_result' print(x, ...)
x |
A subclass_result object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Print method for variable health
## S3 method for class 'variable_health' print(x, ...)## S3 method for class 'variable_health' print(x, ...)
x |
A variable_health object |
... |
Additional arguments (ignored) |
Invisibly returns the input object x.
Matches treated and control units based on estimated propensity scores.
Fits a logistic regression model (or accepts a pre-fitted one), computes
logit propensity scores, and calls match_couples() with a caliper on
the logit scale.
ps_match( formula = NULL, data = NULL, treatment = NULL, ps_model = NULL, caliper_sd = 0.2, method = "auto", replace = FALSE, ratio = 1L, ... )ps_match( formula = NULL, data = NULL, treatment = NULL, ps_model = NULL, caliper_sd = 0.2, method = "auto", replace = FALSE, ratio = 1L, ... )
formula |
Formula for propensity score model (treatment ~ covariates).
Required if |
data |
Combined dataset containing both treated and control units |
treatment |
Name of the binary treatment column (0/1 or logical) |
ps_model |
Pre-fitted |
caliper_sd |
Caliper width in standard deviations of logit(PS). Default: 0.2 (Rosenbaum and Rubin recommendation). |
method |
LAP solver method (default: "auto") |
replace |
If TRUE, match with replacement (default: FALSE) |
ratio |
Integer k for k:1 matching (default: 1) |
... |
Additional arguments passed to |
The propensity score is the probability of treatment assignment conditional on observed covariates. Matching is performed on the logit of the propensity score (Rosenbaum and Rubin 1985), which provides better distributional properties than matching on the raw probability scale.
The default caliper of 0.2 SD of logit(PS) is recommended by Austin (2011) as removing approximately 98% of bias.
A matching_result object with additional propensity score info
in result$info$ps_model and result$info$caliper_value.
set.seed(42) n <- 100 data <- data.frame( id = seq_len(n), treated = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), income = rnorm(n, 50000, 15000) ) result <- ps_match(treated ~ age + income, data = data, treatment = "treated") print(result)set.seed(42) n <- 100 data <- data.frame( id = seq_len(n), treated = rbinom(n, 1, 0.4), age = rnorm(n, 50, 10), income = rnorm(n, 50000, 15000) ) result <- ps_match(treated ~ age + income, data = data, treatment = "treated") print(result)
Assesses how sensitive a matched comparison is to hidden bias using Rosenbaum bounds on the Wilcoxon signed-rank statistic.
sensitivity_analysis( result, left, right, outcome_var, gamma = seq(1, 3, by = 0.25), alternative = c("greater", "less", "two.sided"), left_id = "id", right_id = "id" )sensitivity_analysis( result, left, right, outcome_var, gamma = seq(1, 3, by = 0.25), alternative = c("greater", "less", "two.sided"), left_id = "id", right_id = "id" )
result |
A matching_result object from |
left |
Original left (treated) dataset |
right |
Original right (control) dataset |
outcome_var |
Name of the outcome column in |
gamma |
Numeric vector of sensitivity parameters (default: seq(1, 3, by = 0.25)). Gamma = 1 means no hidden bias. |
alternative |
Direction of the test: "greater" (default), "less", or "two.sided" |
left_id |
Name of ID column in left (default: "id") |
right_id |
Name of ID column in right (default: "id") |
Rosenbaum (2002, Chapter 4) bounds quantify how much hidden bias (an unobserved confounder) would be needed to explain away the observed treatment effect. The sensitivity parameter Gamma represents the maximum ratio of treatment odds between two matched units:
Gamma = 1: No hidden bias (standard Wilcoxon test)
Gamma = 2: One unit could be twice as likely to receive treatment due to an unobserved factor
The function computes upper and lower bounds on the p-value of the Wilcoxon signed-rank test under each level of hidden bias. A finding is "insensitive to bias" if p_upper remains below 0.05 even at large Gamma.
An S3 object of class sensitivity_analysis containing:
Tibble with columns: gamma, t_stat, p_upper, p_lower
Number of matched pairs analyzed
Smallest gamma at which p_upper > 0.05
Direction of test
Rosenbaum, P.R. (2002). Observational Studies, 2nd edition. Springer.
set.seed(42) left <- data.frame(id = 1:20, x = rnorm(20), outcome = rnorm(20, 1, 1)) right <- data.frame(id = 21:40, x = rnorm(20), outcome = rnorm(20, 0, 1)) result <- match_couples(left, right, vars = "x") sens <- sensitivity_analysis(result, left, right, outcome_var = "outcome") print(sens)set.seed(42) left <- data.frame(id = 1:20, x = rnorm(20), outcome = rnorm(20, 1, 1)) right <- data.frame(id = 21:40, x = rnorm(20), outcome = rnorm(20, 0, 1)) result <- match_couples(left, right, vars = "x") sens <- sensitivity_analysis(result, left, right, outcome_var = "outcome") print(sens)
Compute an entropy-regularized optimal transport plan using the 'Sinkhorn-Knopp' algorithm. Unlike other LAP solvers that return a hard 1-to-1 assignment, this returns a soft assignment (doubly stochastic matrix).
sinkhorn( cost, lambda = 10, tol = 1e-09, max_iter = 1000, r_weights = NULL, c_weights = NULL )sinkhorn( cost, lambda = 10, tol = 1e-09, max_iter = 1000, r_weights = NULL, c_weights = NULL )
cost |
Numeric matrix of transport costs. |
lambda |
Regularization parameter (default 10). Higher values produce sharper (more deterministic) transport plans; lower values produce smoother distributions. Typical range: 1-100. |
tol |
Convergence tolerance (default 1e-9). |
max_iter |
Maximum iterations (default 1000). |
r_weights |
Optional numeric vector of row marginals (source distribution). Default is uniform. Will be normalized to sum to 1. |
c_weights |
Optional numeric vector of column marginals (target distribution). Default is uniform. Will be normalized to sum to 1. |
The 'Sinkhorn-Knopp' algorithm solves the entropy-regularized optimal transport problem:
subject to row sums = r_weights and column sums = c_weights.
The entropy term H(P) encourages spread in the transport plan. As lambda -> Inf, the solution approaches the standard (unregularized) optimal transport.
Key differences from standard LAP solvers:
Returns a soft assignment (probabilities) not a hard 1-to-1 matching
Supports unequal marginals (weighted distributions)
Differentiable, making it useful in ML pipelines
Very fast: O(n^2) per iteration with typically O(1/tol^2) iterations
Use sinkhorn_to_assignment() to round the soft assignment to a hard matching.
A list with elements:
transport_plan — numeric matrix, the optimal transport plan P.
Row sums approximate r_weights, column sums approximate c_weights.
cost — the transport cost <C, P> (without entropy term).
u, v — scaling vectors (P = diag(u) * K * diag(v) where K = exp(-lambda*C)).
converged — logical, whether the algorithm converged.
iterations — number of iterations used.
lambda — the regularization parameter used.
Cuturi, M. (2013). 'Sinkhorn Distances': Lightspeed Computation of Optimal Transport. Advances in Neural Information Processing Systems, 26.
assignment() for hard 1-to-1 matching, sinkhorn_to_assignment()
to round soft assignments.
cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE) # Soft assignment with default parameters result <- sinkhorn(cost) print(round(result$transport_plan, 3)) # Sharper assignment (higher lambda) result_sharp <- sinkhorn(cost, lambda = 50) print(round(result_sharp$transport_plan, 3)) # With custom marginals (more mass from row 1) result_weighted <- sinkhorn(cost, r_weights = c(0.5, 0.25, 0.25)) print(round(result_weighted$transport_plan, 3)) # Round to hard assignment hard_match <- sinkhorn_to_assignment(result) print(hard_match)cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE) # Soft assignment with default parameters result <- sinkhorn(cost) print(round(result$transport_plan, 3)) # Sharper assignment (higher lambda) result_sharp <- sinkhorn(cost, lambda = 50) print(round(result_sharp$transport_plan, 3)) # With custom marginals (more mass from row 1) result_weighted <- sinkhorn(cost, r_weights = c(0.5, 0.25, 0.25)) print(round(result_weighted$transport_plan, 3)) # Round to hard assignment hard_match <- sinkhorn_to_assignment(result) print(hard_match)
Convert a soft transport plan from sinkhorn() to a hard 1-to-1 assignment
using greedy rounding.
sinkhorn_to_assignment(result)sinkhorn_to_assignment(result)
result |
Either a result from |
Greedy rounding iteratively assigns each row to its most probable column,
ensuring no column is assigned twice. This may not give the globally optimal
hard assignment; for that, use the transport plan as a cost matrix with
assignment().
Integer vector of column assignments (1-based), same format as
assignment().
cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE) result <- sinkhorn(cost, lambda = 20) hard_match <- sinkhorn_to_assignment(result) print(hard_match)cost <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, byrow = TRUE) result <- sinkhorn(cost, lambda = 20) hard_match <- sinkhorn_to_assignment(result) print(hard_match)
Divides units into K strata based on quantiles of the propensity score, then computes within-stratum weights for treatment effect estimation. This is a simple, transparent approach to propensity score adjustment that allows visual inspection of balance within each subclass.
subclass_match( formula = NULL, data = NULL, treatment = NULL, n_subclasses = 5L, ps = NULL, ps_model = NULL, estimand = "ATT" )subclass_match( formula = NULL, data = NULL, treatment = NULL, n_subclasses = 5L, ps = NULL, ps_model = NULL, estimand = "ATT" )
formula |
Formula for propensity score model (e.g.,
|
data |
Data frame containing all variables |
treatment |
Character, name of the binary treatment column (0/1) |
n_subclasses |
Integer, number of subclasses to create (default: 5). Cochran (1968) showed that 5 subclasses removes over 90\ a single covariate. |
ps |
Optional pre-computed numeric vector of propensity scores
(one per row in |
ps_model |
Optional pre-fitted |
estimand |
Target estimand: |
The algorithm:
Estimate propensity scores via logistic regression (or use pre-computed scores)
Divide the propensity score distribution into K quantile-based strata
For each stratum, check overlap (both treated and control units present)
Compute within-stratum weights based on the target estimand:
ATT: Treated units get weight 1; control units get
weight n_treated_in_stratum / n_control_in_stratum
ATE: Both groups get weight proportional to stratum size relative to total sample
ATC: Control units get weight 1; treated units get
weight n_control_in_stratum / n_treated_in_stratum
An S3 object of class c("subclass_result", "couplr_result")
containing:
Tibble with columns id, side, subclass,
ps, weight
Tibble with per-subclass statistics: counts, mean PS, and overlap status
List with n_subclasses, estimand, n_left,
n_right, method, vars
set.seed(42) n <- 200 data <- data.frame( id = 1:n, age = rnorm(n, 40, 10), income = rnorm(n, 50000, 15000) ) data$treatment <- rbinom(n, 1, plogis(-2 + 0.05 * data$age)) result <- subclass_match(treatment ~ age + income, data, treatment = "treatment") print(result)set.seed(42) n <- 200 data <- data.frame( id = 1:n, age = rnorm(n, 40, 10), income = rnorm(n, 50000, 15000) ) data$treatment <- rbinom(n, 1, plogis(-2 + 0.05 * data$age)) result <- subclass_match(treatment ~ age + income, data, treatment = "treatment") print(result)
Summary method for balance diagnostics
## S3 method for class 'balance_diagnostics' summary(object, ...)## S3 method for class 'balance_diagnostics' summary(object, ...)
object |
A balance_diagnostics object |
... |
Additional arguments (ignored) |
A list containing summary statistics (invisibly)
Summary Method for Distance Objects
## S3 method for class 'distance_object' summary(object, ...)## S3 method for class 'distance_object' summary(object, ...)
object |
A distance_object |
... |
Additional arguments (ignored) |
Invisibly returns the input object.
Extract summary information from k-best assignment results.
## S3 method for class 'lap_solve_kbest_result' summary(object, ...)## S3 method for class 'lap_solve_kbest_result' summary(object, ...)
object |
An object of class |
... |
Additional arguments (unused). |
A tibble with one row per solution containing:
rank: solution rank
solution_id: solution identifier
total_cost: total cost of the solution
n_assignments: number of assignments in the solution
Summary method for matching results
## S3 method for class 'matching_result' summary(object, ...)## S3 method for class 'matching_result' summary(object, ...)
object |
A matching_result object |
... |
Additional arguments (ignored) |
A list containing summary statistics (invisibly)
Summary method for sensitivity analysis
## S3 method for class 'sensitivity_analysis' summary(object, ...)## S3 method for class 'sensitivity_analysis' summary(object, ...)
object |
A sensitivity_analysis object |
... |
Additional arguments (ignored) |
A list containing summary statistics (invisibly)
Apply new constraints to a precomputed distance object without recomputing the underlying distances. This is useful for exploring different constraint scenarios quickly.
update_constraints(dist_obj, max_distance = Inf, calipers = NULL)update_constraints(dist_obj, max_distance = Inf, calipers = NULL)
dist_obj |
A distance_object from |
max_distance |
Maximum allowed distance (pairs with distance > max_distance become Inf) |
calipers |
Named list of per-variable calipers |
This function creates a new distance_object with modified constraints applied to the cost matrix. The original distance_object is not modified.
Constraints:
max_distance: Sets cost to Inf for pairs exceeding this threshold
calipers: Per-variable restrictions (e.g., calipers = list(age = 5))
The function returns a new object rather than modifying in place, following R's copy-on-modify semantics.
A new distance_object with updated cost_matrix
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45)) right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44)) dist_obj <- compute_distances(left, right, vars = "age") # Apply constraints constrained <- update_constraints(dist_obj, max_distance = 2) result <- match_couples(constrained)left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45)) right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44)) dist_obj <- compute_distances(left, right, vars = "age") # Apply constraints constrained <- update_constraints(dist_obj, max_distance = 2) result <- match_couples(constrained)