Package: corrselect 3.2.2

corrselect: Correlation-Based and Model-Based Predictor Pruning

Provides functions for predictor pruning using association-based and model-based approaches. Includes corrPrune() for fast correlation-based pruning, modelPrune() for VIF-based regression pruning, and exact graph-theoretic algorithms (Eppstein–Löffler–Strash, Bron–Kerbosch) for exhaustive subset enumeration. Supports linear models, GLMs, and mixed models ('lme4', 'glmmTMB').

Authors:Gilles Colling [aut, cre, cph]

corrselect_3.2.2.tar.gz
corrselect_3.2.2.zip(r-4.7)corrselect_3.2.2.zip(r-4.6)corrselect_3.2.2.zip(r-4.5)
corrselect_3.2.2.tgz(r-4.6-x86_64)corrselect_3.2.2.tgz(r-4.6-arm64)corrselect_3.2.2.tgz(r-4.5-x86_64)corrselect_3.2.2.tgz(r-4.5-arm64)
corrselect_3.2.2.tar.gz(r-4.7-arm64)corrselect_3.2.2.tar.gz(r-4.7-x86_64)corrselect_3.2.2.tar.gz(r-4.6-arm64)corrselect_3.2.2.tar.gz(r-4.6-x86_64)
corrselect_3.2.2.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
corrselect/json (API)

# Install 'corrselect' in R:
install.packages('corrselect', repos = c('https://gcol33.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/gcol33/corrselect/issues

Pkgdown/docs site:https://gillescolling.com

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

Conda:

correlationenumerationfeature-selection-glmgraph-algorithmsmachine-learningmixed-modelsmulticollinearityregressionstatisticsvariable-selectionvifcpp

6.18 score 2 stars 15 scripts 509 downloads 7 exports 2 dependencies

Last updated from:2fb982f0d8. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK200
linux-devel-x86_64OK204
source / vignettesOK312
linux-release-arm64OK211
linux-release-x86_64OK228
macos-release-arm64OK172
macos-release-x86_64OK205
macos-oldrel-arm64OK157
macos-oldrel-x86_64OK213
windows-develOK455
windows-releaseOK712
windows-oldrelOK430
wasm-releaseOK226

Exports:assocSelectCorrCombocorrPrunecorrSelectcorrSubsetMatSelectmodelPrune

Dependencies:RcppS7

Comparison with Alternatives
Overview | Evaluation Dataset | Comparison 1: caret::findCorrelation() | Method | Execution | Distribution Comparison | Comparison | Applications | Comparison 2: Boruta | Sequential Application | Comparison 3: glmnet (LASSO/Ridge) | Coefficient Comparison | Comparison 4: modelPrune() vs Manual VIF Removal | Manual Implementation | modelPrune() Comparison | Visual: VIF Comparison | Summary | Method Selection | corrselect Distinguishing Features | Integrated Workflow | References | See Also | Session Info

Last update: 2026-05-31
Started: 2025-11-20

Quick Start
Installation | What corrselect Does | Interface Hierarchy | Level 1: Simple Pruning | Level 2: Structured Subset Selection | Level 3: Low-Level Matrix Interface | Quick Examples | corrPrune(): Association-Based Pruning | modelPrune(): VIF-Based Pruning | corrSelect(): Enumerate All Maximal Subsets | assocSelect(): Mixed-Type Data | Protecting Variables | Threshold Selection | Interface Selection Guide | Quick Reference | corrPrune() | modelPrune() | corrSelect() | assocSelect() | MatSelect() | corrSubset() | Troubleshooting | See Also | Session Info

Last update: 2026-03-23
Started: 2025-11-20

Theory and Formulation
Overview | Contents | Terminology | Association measure | Association matrix | Threshold ((\tau)) | Valid subset | Maximal valid subset | Threshold graph | Clique | Maximal clique | Forced-in variables (force_in) | ELS (Eppstein–Löffler–Strash) | Bron–Kerbosch | Greedy mode | Exact mode | Auto mode | Intuitive Overview | The Core Idea | How corrselect Works | Why "Maximal" Not "Maximum"? | Toy Example (4 Variables) | Graph Representation | Visual Graph Representation | Network Visualization with cor_example | Finding Maximal Cliques | Key Insight | Problem Formulation | Intuitive Problem Statement | Formal Problem Statement | Association Matrix | Threshold Constraint | Maximal Valid Subsets | Graph-Theoretic Interpretation | Why Graphs? | Threshold Graph | Maximal Cliques | Example: 6-Variable Threshold Graph | From Theory to Implementation | Threshold ((\tau)) → threshold argument | Maximal cliques → Returned subsets | Forced-in set ((F)) → force_in argument | Search type → mode and method arguments | Association matrix ((A)) → Data input and matrix-based functions | Graph density → Performance considerations | Example mapping | Search Algorithms | Exact Enumeration | Eppstein–Löffler–Strash (ELS) | Pseudocode for Practitioners | Greedy Heuristic | Algorithm Pseudocode | Bron–Kerbosch with Pivoting | Forced Variables | Graph Modification | Correlation vs Association | Complexity Analysis | Output Structure | Design Philosophy | Why Hard Threshold Not Soft Constraint? | Why Graph Algorithms Not Optimization? | Why Pairwise Associations Only? | Why Enumerate All Solutions Not Just Return One? | References | Graph-Theoretic Algorithms | Multicollinearity and Variable Selection | Association Measures | Threshold Graph Theory | Computational Complexity | Applications | See Also | Session Info

Last update: 2026-03-23
Started: 2025-11-25

Advanced Topics
Overview | 1. Understanding the Algorithms | 1.1 Exact vs Greedy: When to Use Each | Exact Mode (Graph-Theoretic) | Greedy Mode (Heuristic) | Auto Mode (Recommended) | 1.2 Complexity Analysis with Benchmarks | 1.3 Deterministic Tie-Breaking | 1.4 Grouped Pruning | Basic Usage | Parameters | When to Use | 2. Custom Engines for modelPrune() | 2.1 Understanding Custom Engines | What is a Custom Engine? | Built-in Criteria | How Custom Engines Work | Engine Structure Requirements | 2.2 Example: INLA Engine (Bayesian Spatial Models) | Background | Implementation | How It Works | 2.3 Example: mgcv Engine (GAMs) | 2.4 Example: Custom Criterion (AIC-Based) | 2.5 Validation and Error Handling | Automatic Validation | Validation Checklist | Debugging Tips | 3. Exact Subset Enumeration | 3.1 When You Need ALL Maximal Subsets | 3.2 Exploring Multiple Subsets | 3.3 Extracting Specific Subsets | 3.4 Advanced: Domain-Specific Subset Selection | 4. Performance Optimization | 4.1 Precomputed Correlation Matrices | 4.2 Memory Considerations for Large Data | Memory-Efficient Correlation Computation | Sparse Correlation Matrices | 4.3 Parallel Processing Strategies | 4.4 Choosing the Right Mode | 5. Troubleshooting | 5.1 Common Errors and Solutions | Error: "No valid subsets found" | Error: force_in variables conflict with threshold | Error: VIF computation fails in modelPrune() | 5.2 Threshold Selection Guidance | For corrPrune() (Correlation Threshold) | For modelPrune() (VIF Limit) | Empirical Approach: Visualize First | 5.3 Handling Edge Cases | Single Predictor After Pruning | All Variables Removed | Mixed-Type Data | 6. Best Practices | 6.1 Workflow Recommendations | For Exploratory Analysis | For Publication-Quality Analysis | 6.2 Combining with Other Methods | 7. Summary | Key Takeaways | Algorithms | Custom Engines | Optimization | Troubleshooting | 8. References | See Also | Session Info

Last update: 2026-02-24
Started: 2025-11-20

Complete Workflows: Real-World Examples
Overview | Workflow 1: Ecological Modeling | Data | Correlation-based pruning | Correlation distribution | Fit models | Model comparison | Visualization | Coefficient stability | Workflow 2: Survey Data Analysis | Prune with protected variables | Construct coverage | Model satisfaction | Workflow 3: High-Dimensional Data | Greedy pruning | Dimensionality reduction | Exact vs greedy comparison | Classification | Workflow 4: Mixed Models | Prune fixed effects | Final model | VIF verification | VIF reduction visualization | Summary | See Also | References | Session Info

Last update: 2026-02-24
Started: 2025-11-20