Package: vectra 0.9.0

vectra: Columnar Query Engine for Larger-than-RAM Data

A minimal columnar query engine with lazy execution on datasets larger than RAM. Provides 'dplyr'-like verbs (filter(), select(), mutate(), group_by(), summarise(), joins, window functions) and common aggregations (n(), sum(), mean(), min(), max(), sd(), first(), last()) backed by a pure C11 pull-based execution engine and a custom on-disk format ('.vtr'). Reads and writes 'GeoTIFF' (including tiled and 'BigTIFF' layouts) and a tiled raster format ('.vec') with overview pyramids and time cubes for larger-than-RAM raster data. Streams vector operations (spatial transforms, point-in-polygon and nearest-feature joins including a two-sided grid-partitioned join, select-by-location, clip, erase, dissolve, rasterization, polygonization, and contouring) through 'sf', and runs raster operations (zonal statistics, focal windows, terrain derivatives, resample or reproject warp, polygon masking, map algebra, and mosaicking) in native C or over the tiled '.vec' format, one batch or tile at a time for data larger than RAM.

Authors:Gilles Colling [aut, cre, cph]

vectra_0.9.0.tar.gz
vectra_0.9.0.zip(r-4.7)vectra_0.9.0.zip(r-4.6)vectra_0.9.0.zip(r-4.5)
vectra_0.9.0.tgz(r-4.6-x86_64)vectra_0.9.0.tgz(r-4.6-arm64)vectra_0.9.0.tgz(r-4.5-x86_64)vectra_0.9.0.tgz(r-4.5-arm64)
vectra_0.9.0.tar.gz(r-4.7-arm64)vectra_0.9.0.tar.gz(r-4.7-x86_64)vectra_0.9.0.tar.gz(r-4.6-arm64)vectra_0.9.0.tar.gz(r-4.6-x86_64)
vectra_0.9.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
vectra/json (API)

# Install 'vectra' in R:
install.packages('vectra', repos = c('https://gcol33.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/gcol33/vectra/issues

Pkgdown/docs site:https://gillescolling.com

Uses libs:
  • openmp– GCC OpenMP (GOMP) support library

On CRAN:

Conda:

ccolumnardata-analysislarge-dataquery-engineopenmp

7.77 score 8 stars 1 packages 32 scripts 311 downloads 96 exports 8 dependencies

Last updated from:10ecd302ae. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK187
linux-devel-x86_64OK184
source / vignettesOK270
linux-release-arm64OK193
linux-release-x86_64OK185
macos-release-arm64OK132
macos-release-x86_64OK325
macos-oldrel-arm64OK115
macos-oldrel-x86_64OK220
windows-develOK200
windows-releaseOK227
windows-oldrelOK201
wasm-releaseOK169

Exports:acrossanti_joinappend_vtrarrangebind_colsbind_rowsblock_fuzzy_lookupblock_lookupchunk_feedercollectcollect_chunkedcollect_sfcontourscountcreate_indexcross_joindelete_vtrdescdiff_vtrdistinctexplainfilterfocalfull_joinfuzzy_joinglimpsegridgroup_bygroup_mapgroup_modifyhas_indexinner_joinleft_joinlinklookupmaskmaterializemosaicmutateoffloadpolygonizeproximitypullrast_calcrasterizereframerelocaterenameright_joinselectsemi_joinsliceslice_headslice_maxslice_minslice_tailspatial_clipspatial_dissolvespatial_filterspatial_joinspatial_mapspatial_overlaysummarisesummarizetallytbltbl_csvtbl_sqlitetbl_tifftbl_xlsxterraintiff_band_namestiff_crstiff_extract_pointstiff_metadatatransmuteungroupvec_build_overviewsvec_close_rastervec_extract_pointsvec_open_rastervec_raster_layoutvec_raster_timesvec_read_pixel_seriesvec_read_time_slicevec_read_windowvec_to_tiffvec_write_rastervec_write_time_cubevtr_schemawarpwrite_csvwrite_sqlitewrite_tiffwrite_vtrzonal

Dependencies:cligluelibgeoslifecyclerlangtidyselectvctrswithr

Out-of-core GIS with vectra
The streaming envelope | From a raster to one row per cell | Select by location | Rasterize a streamed point set | Terrain on a streamed DEM | Back from raster to vector | Masking and raster math | Distance to the nearest feature | The cost model | Where to go next

Last update: 2026-06-25
Started: 2026-06-25

Getting Started with vectra
Introduction | Writing and reading data | Filtering and selecting | Transforming columns | String operations | Aggregation | Sorting and slicing | Joins | Window functions | Dates and times | String similarity | Tree traversal | Format backends | Indexes | Incremental operations | Materialized blocks | Inspecting the plan | Where to go next | Cleanup

Last update: 2026-06-25
Started: 2026-03-24

Offloading: streaming, monoids, and out-of-core fits
Two cost models | When a reduction streams in one pass | The offload functor | Localizing coupling by sorting and partitioning | The monoidal reduce, and the law it rests on | The cost tiers | A note on the abstraction | Choosing a path

Last update: 2026-06-11
Started: 2026-06-11

Species Distribution Models
What vectra does for a distribution model | Align coordinates before extracting | Build a predictor table | Fit in memory when the table fits | Stream when the table is larger than memory | A single pass with collect_chunked() | An out-of-core GLM with chunk_feeder() | Choosing a path

Last update: 2026-06-11
Started: 2026-06-11

Format Backends
Introduction | The .vtr format | CSV | SQLite | Excel | GeoTIFF | Integer pixel types | Embedded metadata | Point extraction | Streaming conversion pipelines | Batch size | Format comparison

Last update: 2026-04-05
Started: 2026-04-04

vectra Engine Reference
What vectra is | Execution model | Pull-based pipeline | Selection vectors (zero-copy filtering) | Columnar storage | Data sources | Output sinks | Supported verbs | Transformation verbs | Aggregation verbs | Ordering verbs | Join verbs | Window functions | Other verbs | tidyselect support | Supported types | Base types | Annotated types | Coercion rules | Arithmetic and comparison expressions | Join key coercion | bind_rows coercion | NA semantics | Storage | Propagation | is.na() | Ordering guarantees | Streaming vs materializing | Streaming nodes (constant memory per batch) | Materializing nodes | External sort (spill-to-disk) | Join memory model | The .vtr file format | Layout | Version history | Encoding and compression (v4) | Query optimizer | Predicate pushdown | Column pruning | Hidden mutate insertion | explain() contract | Hash indexes (.vtri) | Creating indexes | Index format | How the scan uses indexes | Performance characteristics | Materialized blocks | Exact lookups | Fuzzy lookups | Use case | OpenMP parallelization | Which operations parallelize | Thread safety | Current limitations | Fallback behavior | Grouping preservation | Package conflicts

Last update: 2026-04-05
Started: 2026-03-06

Joins
Introduction | Key specification | Left join | Inner join | Right and full joins | Filtering joins: semi and anti | Cross join | Fuzzy joins | Multi-column keys | Key coercion and NA handling | Memory and performance | Practical guidance

Last update: 2026-04-05
Started: 2026-04-04

Star Schemas and Lookup
The flat-table problem | The star schema concept | Setting up a schema | Looking up dimension columns | Match reporting | Named keys | Join modes | Reusing the schema | Practical patterns | Pattern 1: filtering before lookup | Pattern 2: aggregation after lookup | Pattern 3: multiple dimensions in one aggregation | Pattern 4: writing results back | When not to use a schema

Last update: 2026-04-05
Started: 2026-04-05

Indexing and Query Optimization
Introduction | Zone-map pruning | Hash indexes | Composite indexes | Case-insensitive indexes | %in% acceleration | Column pruning | Predicate pushdown | Reading explain() output | Materialized blocks | Practical guidance

Last update: 2026-04-04
Started: 2026-04-04

String Operations and Fuzzy Matching
Introduction | Basic string functions | Trimming whitespace | Case conversion | String length | Extracting substrings | Prefix and suffix tests | String concatenation | paste0: no separator | paste: custom separator | Multi-column paste | Pattern matching: fixed strings | Filtering with grepl | Replacing with gsub and sub | sub: first match only | Pattern matching: regex | Regex filtering with grepl | Regex replacement with gsub | sub with regex: first match only | str_extract: capturing structured parts | Fuzzy string matching | Levenshtein distance | Damerau-Levenshtein distance | Jaro-Winkler similarity | Choosing a distance metric | Filtering by distance threshold | Ranking by similarity | Fuzzy joins | Basic fuzzy join | Methods: dl, levenshtein, jw | Blocking for performance | Thread control | Block lookups | Exact lookups | Case-insensitive lookups | Fuzzy lookups | Performance | Tier 1: byte-level operations (essentially free) | Tier 2: pattern matching (depends on pattern complexity) | Tier 3: fuzzy distance computation (CPU-bound) | Two-pass string building | Practical guidance | Which distance metric for which use case | Choosing max_dist thresholds | Handling encoding issues | Common cleaning patterns | Combining approaches

Last update: 2026-04-04
Started: 2026-04-04

Working with Large Data
Introduction | Streaming pipelines | Batch sizing | Append workflows | Delete and tombstones | Diff between snapshots | External sort | Streaming joins | Multi-file workflows | Format conversion ETL | Memory budget planning | Cleanup

Last update: 2026-04-04
Started: 2026-04-04

Readme and manuals

Help Manual

Help pageTopics
Apply a function across multiple columnsacross
Append rows to an existing .vtr fileappend_vtr
Sort rows by column valuesarrange
Bind rows or columns from multiple vectra tablesbind_cols bind_rows
Fuzzy-match query keys against a materialized blockblock_fuzzy_lookup
Probe a materialized block by column valueblock_lookup
Turn a query into a resettable chunk generatorchunk_feeder
Execute a lazy query and return a data.framecollect
Fold a function over a query, one batch at a timecollect_chunked collect_chunked.default collect_chunked.vectra_node collect_chunked.vectra_partition
Materialize a spatial query as an sf objectcollect_sf
Extract contour iso-lines from a streamed rastercontours
Count observations by groupcount tally
Create a hash index on a .vtr file columncreate_index
Cross join two vectra tablescross_join
Logically delete rows from a .vtr filedelete_vtr
Mark a column for descending sort orderdesc
Compute the logical diff between two .vtr filesdiff_vtr
Keep distinct/unique rowsdistinct
Print the execution plan for a vectra queryexplain
Filter rows of a vectra queryfilter
Moving-window (focal) statistics over a streamed rasterfocal
Fuzzy join two vectra tables by string distancefuzzy_join
Get a glimpse of a vectra tableglimpse
Define a uniform grid for a partitioned spatial joingrid
Group a vectra query by columnsgroup_by
Apply a function to each shard of a partitiongroup_map group_map.vectra_partition group_modify group_modify.vectra_partition
Check if a hash index exists for a .vtr columnhas_index
Limit results to first n rowshead.vectra_node
Join two vectra tablesanti_join full_join inner_join left_join right_join semi_join
Define a link between a fact table and a dimension tablelink
Look up columns from linked dimension tableslookup
Mask a streamed raster to a polygon layermask
Materialize a vectra node into a reusable in-memory blockmaterialize
Merge aligned rasters onto a common gridmosaic
Add or transform columnsmutate
Spill a query to disk and stream it back (the offload functor)offload
Vectorise a raster into polygonspolygonize
Print a vectra query nodeprint.vectra_node
Euclidean distance to the nearest feature (proximity)proximity
Extract a single column as a vectorpull
Cellwise calculation over aligned rasters (map algebra)rast_calc
Rasterize a streamed point layer onto a fixed gridrasterize
Summarise with variable-length output per groupreframe
Relocate columnsrelocate
Rename columnsrename
Select columns from a vectra queryselect
Select rows by positionslice
Select first or last rowsslice_head slice_max slice_min slice_tail
Clip or erase a streamed layer against a resident maskspatial_clip
Dissolve geometries by groupspatial_dissolve
Keep streamed rows by their spatial relation to a resident layerspatial_filter
Spatial join a streamed query against a resident sf objectspatial_join
Stream a query through an sf transformspatial_map
Self-overlay a polygon layer into disjoint pieces (QGIS-style Union)spatial_overlay
Summarise grouped datasummarise summarize
Create a lazy table reference from a .vtr filetbl
Create a lazy table reference from a CSV filetbl_csv
Create a lazy table reference from a SQLite databasetbl_sqlite
Create a lazy table reference from a GeoTIFF rastertbl_tiff
Create a lazy table reference from an Excel (.xlsx) filetbl_xlsx
Terrain derivatives from a streamed elevation rasterterrain
Read per-band names from a GeoTIFFtiff_band_names
Read CRS metadata from a GeoTIFFtiff_crs
Extract raster values at point coordinatestiff_extract_points
Read GDAL_METADATA from a GeoTIFFtiff_metadata
Keep only columns from mutate expressionstransmute
Remove grouping from a vectra queryungroup
Build overview pyramids for a .vec rastervec_build_overviews
Close a .vec raster handlevec_close_raster
Extract band values at (x, y) points from a .vec rastervec_extract_points
Open a .vec rastervec_open_raster
Tile layout of an open .vec rastervec_raster_layout
Distinct time stamps stored in a .vec time cubevec_raster_times
Read the full time series at a single pixel from a .vec time cubevec_read_pixel_series
Read a single time slice from a .vec time cubevec_read_time_slice
Read a window of pixels from a .vec rastervec_read_window
Export a .vec raster to GeoTIFFvec_to_tiff
Write a raster matrix or 3D array to a .vec raster filevec_write_raster
Write a 4D time-cube raster to .vecvec_write_time_cube
Create a star schema over linked vectra tablesvtr_schema
Resample or reproject a streamed raster onto a target gridwarp
Write query results or a data.frame to a CSV filewrite_csv
Write query results or a data.frame to a SQLite tablewrite_sqlite
Write query results to a GeoTIFF filewrite_tiff
Write data to a .vtr filewrite_vtr
Summarise raster values within zoneszonal