Every desktop GIS runs its toolbox one in-memory layer at a time. Open a layer, run the tool, write the result. The working layer has to fit in RAM, and a country-scale point set or a national DEM often does not.
vectra runs the same toolbox one batch at a time. A query is pulled through the C engine in fixed-size row groups, each spatial step works on the batch in front of it, and the transformed batch spills back to disk as a fresh lazy node. Peak memory is one batch and whatever small resident layer the step needs, so a billion-row layer flows past a fixed memory budget.
Topology stays with sf and terra: vectra adds no GEOS or GDAL link. Geometry rides through the engine as hex-encoded WKB in an ordinary string column, and the coordinate reference system is carried on the node. What vectra contributes is the streaming envelope around the operations a desktop GIS keeps resident.
A raster is a dense grid of values. To feed it into a tabular pipeline, read it as one row per cell, each cell a record of its coordinate and value. That is the bridge between the raster cube and every verb in the engine.
vec_open_raster() opens a .vec raster cube,
and vec_extract_points() samples it at coordinates. A whole
national climate grid stays on disk while the occurrence points it is
queried at stay in memory.
clim <- vec_open_raster("worldclim_bio.vec")
occ <- read.csv("occurrences.csv") # species, x, y
occ <- cbind(occ, vec_extract_points(clim, occ$x, occ$y))
head(occ)For a GeoTIFF the same sampling is
tiff_extract_points(). From here the points are an ordinary
vectra table, ready for any verb.
The most-used vector tool keeps the features that stand in a spatial
relation to another layer: points inside a study region, parcels
touching a road, patches within a buffer. spatial_filter()
streams the large layer and tests each batch against a small resident
locator with an sf predicate, keeping or dropping whole features.
region <- sf::st_read("study_area.gpkg", quiet = TRUE)
inside <- tbl("occurrences.vtr") |>
spatial_filter(region, coords = c("x", "y"), crs = 4326)The result carries the same schema as the input, with the outside
features removed. negate = TRUE keeps the outside features
instead. To cut geometry along the region rather than keep whole
features, spatial_clip() intersects each batch with the
mask, and erase = TRUE keeps the part outside it.
To tag each feature with the polygon it falls in rather than filter,
spatial_join() appends the polygon attributes. When both
layers are larger than RAM, partition = grid(cellsize) bins
them to a grid and joins one shard at a time, with the right layer
itself a streamed node.
Folding a point set into a grid is the headline larger-than-RAM
operation: the grid fits in memory, the points do not.
rasterize() maps each point to its cell through the
geotransform and accumulates a per-cell value in C, one batch at a time,
no spill. This is the monoid-fold tier: bounded memory, a single
pass.
# Point density on a continental grid, streamed from a billion-row file.
density <- tbl("gbif_points.vtr") |>
rasterize(extent = c(-180, -90, 180, 90), res = 0.1,
fun = "count", path = "density.vec")fun is the per-cell reduction: "count",
"sum", "mean", "min",
"max". With a field it aggregates that column
rather than tallies points. The output is a .vec cube or an
in-memory matrix.
Going the other way, zonal() summarises a raster within
zones, streaming the grid one tile-row strip at a time and folding
per-zone moments in memory.
Moving-window operations read each output tile expanded by the kernel
radius (a halo read), so a national DEM never has to be resident.
focal() applies an arbitrary weight window, and
terrain() derives slope, aspect, hillshade, TPI, roughness,
and TRI with Horn’s method, matching terra::terrain(). When
path is given the result streams straight back to a new
.vec, so neither the input nor the output band is ever
fully in memory.
To resample or reproject, warp() walks the output one
tile-row strip at a time, projects each strip’s pixel centres into the
source CRS (via PROJ when the two CRSs differ), and samples the bounded
source window with nearest, bilinear, or cubic interpolation, matching
terra::project().
rasterize() carries points onto a grid;
polygonize() carries a grid back to vector. It reads one
tile-row strip at a time and dissolves equal-valued cells into one
polygon per value, so a classified land-cover raster becomes a polygon
layer out of core.
contours() traces iso-lines from a continuous surface
with marching squares over the same haloed strip pass, then joins each
level’s segments into continuous lines.
mask() clips a raster to a polygon layer one strip at a
time, keeping the pixels whose centre falls inside it, the raster
counterpart of spatial_clip().
rast_calc() evaluates a cellwise expression across
aligned rasters, reading each input strip by strip. A vegetation index,
a reclassification, or arithmetic across layers is one expression over
the named rasters.
ndvi <- rast_calc(list(nir = "nir.vec", red = "red.vec"),
(nir - red) / (nir + red), path = "ndvi.vec")mosaic() merges rasters that share a resolution and cell
grid onto their union, resolving overlap with mean,
first, last, min,
max, or sum, one output strip at a time.
proximity() gives every cell its Euclidean distance to
the nearest feature cell, in CRS units. Features are the non-NA cells,
or the cells whose value is in target. The transform is
separable: a pass along the rows, a pass along the columns, each linear
in the line length, with an out-of-core transpose between them so the
whole grid is never resident. Squared distances scale by the x and y
resolution, so the answer is exact on non-square cells.
dist_to_road <- proximity("roads.vec", path = "road_distance.vec")
sea_distance <- proximity("landcover.vec", target = 0) # 0 = waterCost-distance, which accumulates a per-cell friction along the path,
is a global shortest-path problem and stays resident:
collect() the raster and run a resident solver for it.
Each operation states its memory tier, so the toolbox reads as a cost model rather than a grab-bag. The tier tells you what a run holds resident.
spatial_filter(), spatial_clip(),
rasterize(), zonal(), and every
spatial_map() recipe.focal(), terrain(), warp() over
tile-and-halo reads, proximity() over its transpose passes,
spatial_dissolve(), and the partitioned
spatial_join().spatial_overlay() and operations such as Voronoi, Delaunay,
convex hull, and global neighbour graphs. For these, collect the result
and run sf or terra on it; vectra does not pretend to stream what is
global by nature.That boundary is the honest part. The streaming envelope covers the operations whose memory can be bounded, and names the ones it cannot.