README

The bean package provides a tool to address a fundamental challenge in species distribution modeling (SDM, or ecological niche modeling, ENM): sampling bias. Occurrence records for species are rarely collected through a systematic, stratified process. Instead, they often cluster in easily accessible areas (like roads and cities) or in well-studied research sites. This spatial bias can translate into an environmental bias, where the model incorrectly learns that the species is associated with the environmental conditions of those heavily sampled areas, rather than its true ecological requirements.

bean tackles this problem by thinning occurrence data in environmental space. The goal is to create a more uniform distribution of points across the species’ observed environmental niche, reducing the influence of densely clustered records. This allows for the construction of a more accurate fundamental niche volume, which can then be projected into geographic space to create a less biased prediction of area with environmental suitability.

The name bean reflects the core principle of the method: ensuring that each “pod” (a grid cell in environmental space) contains only a specified number of “beans” (occurrence points).

Package Description

bean operates by shifting the focus from geographic space to environmental space:

Installing the Package

# Install devtools if needed
if (!require("devtools")) install.packages("devtools")

# Install bean
devtools::install_github("paanwaris/bean")

library(bean)

The bean Protocol: Step-by-Step

1. Data Preparation

The prepare_bean() function cleans raw occurrence data by removing missing coordinates and extracting environmental values from raster layers. This ensures all subsequent analyses use a clean, scaled dataset.

2. Objective Grid Resolution

Instead of arbitrary thinning, find_env_resolution() selects a kernel-density bandwidth for each environmental variable (Sheather–Jones plug-in by default, with Silverman and Scott rules also available). The bandwidth is a statistically defensible choice for the edge length of an environmental grid cell: it is the scale at which the empirical density of observations becomes smooth.

3. Apply Thinning

4. Niche Delineation

The fit_ellipsoid() function formalizes the environmental niche by fitting a bivariate or multivariate ellipse around the thinned points.

Suitability projection back to geographic space is provided by the companion package nicheR, which supplies a predict() method. bean_ellipsoid objects carry the S3 class "nicheR_ellipsoid" as a second class string, so once nicheR is attached its predict() method dispatches on them automatically — no conversion step required. Until nicheR is on CRAN, the ellipsoid’s centroid, cov_matrix, and pre-computed Sigma_inv fields can be used directly with stats::mahalanobis() to compute pixel-level distances on a raster stack.

Checking the Vignettes

# Data Preparation & Visualization
vignette("data-preparation")

# Objective Thinning in Environmental Space
vignette("environmental-thinning")

# Niche Delineation & Suitability Mapping
vignette("niche-modeling")

Acknowledgments

The bean package was adapted from the excellent work done in the nicheR package. We are incredibly grateful to the nicheR developers for their robust framework, which allows bean_ellipsoid objects to utilize their predict() methods. If you use the spatial projection features in bean, please be sure to install and cite nicheR.

Installing nicheR

install.packages("nicheR")
library(nicheR)

Citation

Note on AI usage

To maintain high standards of code quality and documentation, we have used AI LLM tools in this package. We used these tools for grammatical polishing and exploring technical implementation strategies for specialized functions. We manually checked and tested all code and documentation refined with these tools.

bean 🫛

Ecological Motivation