Glossary

Short definitions of the jargon that appears across the SFI docs. Cross-link from any reference page with :term:\`PASTIS\` (etc.).

ABP

Active Brownian Particle — a self-propelled particle carrying a position and a heading angle, the canonical active-matter model. See the ABP gallery demos.

AIC

Akaike Information Criterion. Penalises support cardinality by 2k; the classical “thin” prior.

Basis

A parameter-free dictionary of state functions (Basis) — the model class of the linear estimators, and the linear-in-θ fast path of the parametric estimators. See Building bases.

The default sparse-search strategy of sparsify_force() (the PASTIS original): a beam of candidate supports is grown and pruned by the information criterion.

BIC

Bayesian Information Criterion. Penalises support cardinality by k log N; stricter than AIC at large sample sizes.

bootstrapped trajectory

A trajectory simulated from the inferred force and diffusion (simulate_bootstrapped_trajectory()), used as a qualitative validation and for error propagation.

conditional NLL

The negative log-likelihood seen as a function of \((\mathbf{D}, \Lambda)\) with the model parameters \(\theta\) held at their fitted values — minimised once to refine the profiled noise levels.

degradation

Standardised synthetic data imperfections — added measurement noise, downsampling, frame loss, motion blur — applied via SFI.trajectory.degrade to quantify estimator sensitivity.

errors-in-variables

Regression bias arising when the regressors themselves carry noise. In SFI, localization noise enters both the finite-difference velocities and the basis evaluations at measured positions, biasing the linear estimators on nonlinear systems; the parametric estimators correct it via the skip-trick instrument.

Extras

User-defined fields attached to a TrajectoryCollection and passed to state functions at evaluation time — extras_global (per experiment) and extras_local (per particle). Used for box sizes, species labels, neighbour lists, trap centres, and other contextual data.

G_mode

The Gram-matrix construction mode of the linear estimators: "rectangle", "trapeze", "shift" (overdamped), plus "doubleshift" (underdamped).

Gauss–Newton

Linearisation-then-least-squares method for parametric inference, the fast path for linear-in-θ bases (inner="gn"). Replaces the Hessian of the loss with \(J^\top J\) of the test-function Jacobian.

Gram matrix

\(G_{\alpha\beta} = \langle \phi_\alpha, \phi_\beta \rangle\), the normal-equation matrix assembled by SFI.integrate from time-averaged basis evaluations.

held-out NMSE

The residual-based normalised mean-square error of a fitted force on an independent test collection (inf.holdout_score(test) after coll.split_time(...)), with the diffusion noise floor subtracted. A side feature for data-abundant scenarios — SFI’s default validation (force_predicted_MSE + diagnostics) costs no data; the held-out score is a bias detector whose resolution is set by χ² fluctuations.

Heun

Stochastic Heun predictor–corrector integrator (weak order 2); the default scheme of OverdampedProcess (method="heun"). method="euler" selects the classical Euler–Maruyama integrator (weak order 1).

information criterion

A penalised-likelihood score used to compare sparse supports: PASTIS (recommended), AIC, BIC.

instrument

In errors-in-variables regression, a quantity correlated with the true regressor but uncorrelated with its measurement noise, used to build an unbiased estimating equation.

Interactor

A local K-body interaction rule (Interactor) that is dispatched over a neighbour graph to build a global multi-particle Basis/PSF/SF. See Particle systems.

Itô convention

SDE interpretation where the stochastic increment \(\sqrt{2D(x_t)}\,dW_t\) is evaluated at the left endpoint of the time step. See Physics Reference.

JAX persistent cache

On-disk cache of compiled JAX traces, opt-in via SFI_JAX_CACHE_DIR=~/.cache/sfi/jax_cache. Saves seconds to minutes per session on repeated runs.

L-BFGS

Limited-memory BFGS, a quasi-Newton optimiser; the inner solver of the parametric estimator for nonlinear-in-θ PSF families (inner="lbfgs") and for the \((D, \Lambda)\) profile.

LASSO

L1-penalised least-squares for sparse model selection. Implemented as LassoStrategy.

Layout

The grid declaration of the experimental SPDE toolbox (GridLayout): named field sectors on a regular grid with boundary conditions, providing differential operators and symmetry-aware embedding. See Structured fields: Layout, Sectors, and Embed.

linear estimators

The closed-form estimator family: infer_force_linear(), infer_diffusion_linear(), compute_diffusion_constant(). A projection onto a basis — no initial guess, no iterations — exact in the fine-sampling, low-noise limit, biased outside it. See Choosing an estimator.

local-precision NLL

Negative log-likelihood weighted by the inverse of the locally estimated noise covariance, used in the parametric path to handle heteroscedastic measurement noise. See Parametric windowed estimators — concepts.

M_mode

The moment/kinematics convention of the linear estimators. Overdamped: "auto" (noise-aware selection), "Ito", "Ito-shift", "Strato". Underdamped: "symmetric" (the "auto" resolution), "early", "anticipated".

mask

The boolean validity array (shape (T, N)) attached to each dataset, encoding missing frames and particles entering or leaving. Honoured automatically by state functions and estimators.

measurement-noise covariance

The covariance \(\Lambda\) of the localization error on each recorded position. Estimated jointly with the diffusion by the Vestergaard method (linear estimators, exposed as inf.Lambda) or profiled natively (parametric estimators).

moment estimator

A closed-form estimator built from low-order moments of the increments — used to initialise the parametric \((\mathbf{D}, \Lambda)\) profile, and, in the linear estimators, selected by the M_mode convention.

neighbour list

CSR-encoded list of neighbour indices for each particle, used by pair-interaction bases. Built host-side via SFI.utils.neighbors.build_neighbor_csr() between JIT chunks; see AGENTS.md §4.8.

NMSE

Normalised Mean Square Error — the canonical force/diffusion accuracy metric: mean squared error of the inferred field divided by the mean square of the true field. Available as inf.NMSE_force after compare_to_exact(); inf.force_predicted_MSE is the a-priori estimate that needs no ground truth.

parametric estimators

The likelihood-based estimator family: infer_force(), infer_diffusion(). One or more RK4 flow steps per observation interval, windowed-precision NLL, native \((\mathbf{D}, \Lambda)\) profiling; robust to measurement noise and coarse sampling, accepts nonlinear-in-θ models. See Choosing an estimator.

Pareto front

The error-vs-sparsity frontier explored by sparsify_force(); the returned SparsityResult stores it and can be re-queried under any criterion without re-running the search.

particles

The N axis of a trajectory’s (T, N, d) state array — the independent or interacting bodies tracked over time (cells, colloids, agents, …). State functions declare how they consume this axis through pdepth: pdepth=0 evaluates one particle at a time (the same law applied independently to each), while pdepth=1 sees all particles together for interactions. The particle count may vary over time; the mask records entries and exits. See Particle systems.

PASTIS

Parsimonious Stochastic Inference — the canonical information criterion used by sparsify_force(). Penalises support cardinality with a Bayes-factor-like prior set by p. Gerardos & Ronceray, Phys. Rev. Lett. 135, 167401 (2025).

PBC

Periodic Boundary Conditions — wrap-around boundaries on a box or grid. Minimum-image inter-particle displacements are computed by SFI.bases.pairs.pbc_displacement().

per-dataset parameter

A model parameter taking an independent inferred value per dataset of a pooled multi-experiment collection, selected through the reserved dataset_index extra: per_dataset_scalar() (parametric estimators) or dataset_indicator() one-hot features (linear estimators). The per-particle analogue lives inside make_interactor() kernels via the reserved particle_index extra. To reproduce a single experiment, fold the model at one index with specialize(), which removes the dataset_index dependence (see specialize).

profiling

Internal estimation of nuisance parameters — in SFI, the diffusion level \(\mathbf{D}\) and measurement-noise covariance \(\Lambda\) during a parametric fit — so the user does not have to supply them. Skipped entirely when both are passed explicitly.

PSF

Parametric State Function (PSF) — a model family \(F(x;\theta)\) with a named parameter tree; the model class of nonlinear parametric inference. See Models and state functions.

rank

The tensor rank of a state-function output: 0 = scalar, 1 = vector (forces), 2 = matrix (diffusion tensors).

RK4

Classical fourth-order Runge–Kutta scheme; used by the parametric estimator to integrate the deterministic drift flow over each observation interval.

secant velocity

Centred finite-difference velocity \(v_t = (x_{t+1} - x_{t-1})/(2\Delta t)\) used by the underdamped diagnostics and the ULI residual. See Diagnostics.

Sector

A named component group within a Layout (e.g. a scalar field U, a Q-tensor), addressed when building SPDE bases.

SF

State Function with frozen parameters (SF) — the evaluable object produced by a fit, ready for Langevin simulation.

skip-trick

The errors-in-variables instrument of the parametric Gauss–Newton path: test functions are evaluated at temporally separated (skipped) observations, decorrelating the instrument from the measurement noise of the residual and restoring consistency. On by default (eiv="auto").

SPDE

Stochastic Partial Differential Equation — field dynamics on a regular grid, where the drift is a spatial-operator functional of the field. SFI infers them via composable stencil operators (experimental toolbox); see Spatial field inference (SPDE).

specialize

Collapse a pooled model to one experiment’s standalone single-condition form: specialize() folds every per-dataset parameter at a chosen dataset_index (per-dataset arrays reduce to that index’s slice; one-hot indicators become constant) so the result does not read dataset_index. Used by simulate_bootstrapped_trajectory() to export a clean single-trajectory model.

STLSQ

Sequential Thresholded Least Squares — the SINDy-style strategy. Implemented as STLSQStrategy.

Stratonovich convention

Mid-point evaluation of the stochastic increment. Required for state-dependent D. See Physics Reference.

trapeze

The trapezoidal Gram construction (G_mode="trapeze"), which symmetrises basis evaluations across each interval and removes the leading finite-Δt bias of the rectangle rule. Amri et al., Phys. Rev. Research 6, 043030 (2024).

ULI

Underdamped Langevin Inference — the position-only inference scheme for inertial systems (Brückner, Ronceray & Broedersz, Phys. Rev. Lett. 125, 058103 (2020)), implemented by UnderdampedLangevinInference.

velocity reconstruction

The underdamped engine’s internal estimation of unobserved velocities from positions (secant differences with bias-corrected moments). You never supply velocities; see Underdamped systems.

Vestergaard

The covariance-based constant-diffusion estimator (after Vestergaard et al.), which fits the diffusion and the localization-error covariance jointly; selected by compute_diffusion_constant(method="noisy") — the noise-robust choice of compute_diffusion_constant() and its "auto" selection when noise is detected. Vestergaard, C. L., Blainey, P. C. & Flyvbjerg, H., Optimal estimation of diffusion coefficients from single-particle trajectories, Phys. Rev. E 89, 022726 (2014).

WeakNoise

The clean-data constant-diffusion estimator of compute_diffusion_constant; assumes negligible localization error.

weights (multi-experiment)

Per-dataset unnormalised multipliers of a TrajectoryCollection, applied to every estimator (force, diffusion, parametric): "pool" (default — pool all increments on equal footing), "per_dataset" (each experiment counts equally), or an explicit array. Within-dataset weighting is intrinsic to each estimator (force per-Δt, diffusion per-point). See Trajectory data.

windowed precision

The banded inverse covariance of the parametric residuals over a short time window, providing the weights of the parametric NLL. Captures the correlations that measurement noise induces between consecutive residuals.