2 Utilities
This chapter provides a brief description of the utility functions used in this material. Most of these functions are packaged in the R package {egvtools}, which was created specifically for this work.
2.1 R package egvtools
{egvtools} provides a coherent set of wrappers and utilities that facilitate the reproducible and efficient creation of large-scale EGVs on real datasets. The package relies on robust building blocks — {terra}, {sf}, {sfarrow}, {exactextractr} and {whitebox} — and standardises input/output, naming conventions and multi-scale zonal statistics, ensuring that the pipelines are repeatable across machines and projects.
The package was developed for the project ‘HiQBioDiv: High-resolution quantification of biodiversity for conservation and management’, which was funded by the Latvian Council of Science (Ref. No. VPP-VARAM-DABA-2024/1-0002), to simplify our work and to facilitate the reproduction of our results. Five of the functions are strictly for replication, while others are useful for a wider audience.
Package can be installed from GitHub with:
or obtained as a Docker container with all the necessary system and software dependencies.
2.1.1 Reproduction only functions
These functions are small wrappers, that helps to recreate our working environments - template files and their locations in the file tree.
These functions are:
download_raster_templates()
— fetch template rasters from Zenodo repository and place them in user specified location on disk, or by default - the one we used. By default this functions links to version 2.0.0 of the dataset;download_vector_templates()
- fetch template vector grids/points from Zenodo repository and place them in user specified location on disk, or by default - the one we used. By default this functions links to version 1.0.1 of the dataset;radius_function() — extracts summary statistics from raster layers using buffered polygon zones of multiple radii and rasterizes them onto a common template grid. Insternally connected to exact parts of the file names used in this project. If they are kept, can be used in other places.
2.1.2 General purpose functions
Each of those functions are small workflows themselves, that can be combined into larger workflows and used more widely, than for Latvia.
tile_vector_grid()
— tile template (vector) grid for chunked processing. The function internally is linked to our file naming convention. As long as it is maintained, function can be used to create tiled grid from any {sfarrow} parquet grid file;tiled_buffers()
— precompute buffered tiles for multiple radii around points. The function internally is linked to our file naming convention. As long as it is maintained, function can be used to create tiled polygons with buffers around points from any {sfarrow} parquet grid file. There are three buffering modes: dense (buffers the best-matching pts100*.parquet (prefers pts100_sauzeme.parquet) for each tile by radii_dense (default: 500, 1250, 3000, 10000 m ensuring that every analysis grid cell has desired buffer. Computationally heavy in the following workflows), sparse (uses a file to radius mapping and is highly generalizable), and specified (the same as sparse, but with one single point file). In our workflows we used the sparse mode with default mapping;create_backgrounds() — a wrapper around
terra::ifel()
to build consistent background rasters. This function better guards coordinate reference system and how it is stored, while also guarding spatial cover, resolution, coordinate reference system, exact pixel matching, etc. Creation of layers with default background values is faster than recreating them several times in workflows preparing EGVs;polygon2input() — rasterize polygons to input layers. Handles only polygon data, other geometry types need to buffered. Rasterizes polygon/multipolygon sf data to a raster aligned to a template GeoTIFF. Rasterization targets a raster::RasterLayer built from the template (so grids normally match). Projection is optional (project_mode). Missing values are counted only over valid template cells. User may optionally restrict the result with a raster mask (restrict_to) using numeric values or bracketed range strings (e.g., “(0,5]”, “[10,)”). Remaining NA cells can be filled by covering with a background raster (background_raster) or a constant (background_value). For large rasters, heavy steps (projection/mask/cover) can stream to disk via terra_todisk=TRUE.
input2egv() — normalize/align a fine-resolution input raster to a (coarser) EGV template, optionally cover missing values and/or fill gaps (IDW via Whitebox), and write the result to disk. Designed for large runs: fast gap counting (inside template footprint only), optional filling, tuned GDAL write options, and controlled terra memory/temp behavior.
downscale2egv() — downscale coarse rasters to a template grid (CRS, resolution, extent), masks to the template footprint, and optionally: (1) fills NoData gaps using WhiteboxTools’ IDW-based fill_missing_data, and (2) applies IDW smoothing to reduce blockiness from low-resolution inputs.
distance2egv() — computes Euclidean distance (in map units) from cells matching a set of class values in an input raster to all cells of an EGV template grid, then writes a Float32 GeoTIFF aligned to the template. Designed to work with rasters produced by
polygon2input()
.landscape_function() — computes a {landscapemetrics} metric (default “lsm_l_shdi”), optionally with extra lm_args, that yields one value per zone and per input layer. Runs tile-by-tile (by tile_field), writes per-tile rasters, merges to final per-layer GeoTIFF(s), then performs gap analysis (NA count within the template footprint and optional maximum gap width) and optional IDW gap filling via WhiteboxTools. Returns a compact data.frame with per-layer stats and timing.
2.2 Other utility functions
Other handy functions repeatedly used, not included in {egvtools} are stored
in egvs02.02_UtilityFunctions.R
file, located in Data/RScipts_final
.
ensure_multipolygons()
- rather agressive function to createMULTIPOLYGON
geometries fromGEOMETRYCOLLECTION
Code
if(!require(sf)) {install.packages("sf"); require(sf)}
if(!require(gdalUtilities)) {install.packages("gdalUtilities"); require(gdalUtilities)}
ensure_multipolygons <- function(X) {
library(sf)
library(gdalUtilities)
tmp1 <- tempfile(fileext = ".gpkg")
tmp2 <- tempfile(fileext = ".gpkg")
st_write(X, tmp1)
ogr2ogr(tmp1, tmp2, f = "GPKG", nlt = "MULTIPOLYGON")
Y <- st_read(tmp2)
st_sf(st_drop_geometry(X), geom = st_geometry(Y))
}