
Paanwaris Paansri and Luis E. Escobar
The bean package provides a tool to address a
fundamental challenge in species distribution modeling (SDM, or
ecological niche modeling, ENM): sampling bias.
Occurrence records for species are rarely collected through a
systematic, stratified process. Instead, they often cluster in easily
accessible areas (like roads and cities) or in well-studied research
sites. This spatial bias can translate into an environmental
bias, where the model incorrectly learns that the species is
associated with the environmental conditions of those heavily sampled
areas, rather than its true ecological requirements.
bean tackles this problem by thinning occurrence data in
environmental space. The goal is to create a more
uniform distribution of points across the species’ observed
environmental niche, reducing the influence of densely clustered
records. This allows for the construction of a more accurate
fundamental niche volume, which can then be projected
into geographic space to create a less biased prediction of area with
environmental suitability.
The name bean reflects the core principle of the method:
ensuring that each “pod” (a grid cell in environmental space) contains
only a specified number of “beans” (occurrence points).
bean operates by shifting the focus from geographic space to environmental space:
Environmental Gridding: Divides the environmental hypercube into “pods”.
Objective Thinning: Reduces clusters to a specified density per pod.
Niche Delineation: Fits ellipsoids to thinned data to define the fundamental niche.
Projection: Maps the corrected niche back into geographic space for less biased predictions.
The development version of bean can be installed from
GitHub:
# Install devtools if needed
if (!require("devtools")) install.packages("devtools")
# Install bean
devtools::install_github("paanwaris/bean")To load the package:
library(bean)A typical bean workflow consists of these key steps:
The prepare_bean() function cleans raw occurrence data
by removing missing coordinates and extracting environmental values from
raster layers. This ensures all subsequent analyses use a clean, scaled
dataset.
See the Preparing bean vignette.
Instead of arbitrary thinning, find_env_resolution()
selects a kernel-density bandwidth for each
environmental variable (Sheather–Jones plug-in by default, with
Silverman and Scott rules also available). The bandwidth is a
statistically defensible choice for the edge length of an environmental
grid cell: it is the scale at which the empirical density of
observations becomes smooth.
See the Finding the environmental resolution vignette.
bean offers two core thinning methods:
Stochastic (thin_env_nd): Randomly samples one
“bean” from each occupied “pod”.
Deterministic (thin_env_center): Generates a new
point at the exact center of every occupied grid cell.
See the Apply thinning vignette.
The fit_ellipsoid() function formalizes the
environmental niche by fitting a bivariate or multivariate ellipse
around the thinned points.
See the Niche delineation vignette.
Using the learned niche, predict() projects the results
back to geographic space. This step emphasizes the ellipsoid-based
approach is used to calculate suitability scores from the delineated
niche boundaries.
See the Prediction and mapping vignette.
For full demonstrations of the protocol, check the package vignettes:
# Data Preparation & Visualization
vignette("data-preparation")
#> Warning: vignette 'data-preparation' not found
# Objective Thinning in Environmental Space
vignette("environmental-thinning")
#> Warning: vignette 'environmental-thinning' not found
# Niche Delineation & Suitability Mapping
vignette("niche-modeling")
#> Warning: vignette 'niche-modeling' not foundTo maintain high standards of code quality and documentation, we have used AI LLM tools in this package. We used these tools for grammatical polishing and exploring technical implementation strategies for specialized functions. We manually checked and tested all code and documentation refined with these tools.