Start here: is SFI right for my data?¶
SFI is built for learning interpretable stochastic dynamics from time-ordered observations of a continuous state. It is most useful when you care about the drift or force field, the diffusion or noise level, sparse term selection, or bootstrapped simulation of the inferred model.
Tip
New to SFI? The Getting started: end-to-end inference (Ornstein–Uhlenbeck) tutorial walks through a complete workflow end to end — simulate, infer the force and diffusion, select a minimal model, validate, and save — on a simple synthetic example. It is the fastest way to see the whole pipeline in action before routing your own data below.
When SFI is a good fit¶
SFI is usually a good fit if:
each observation is a continuous-valued state: positions, angles, velocities, concentrations, field values, abundances of large populations, or another trusted state variable;
stochastic fluctuations are part of the dynamics, not just a nuisance layered on top of a deterministic fit;
adjacent frames are close enough in time that they remain correlated;
you want an explicit dynamical law rather than a black-box forecaster.
When SFI is usually the wrong tool¶
SFI is usually not the right tool if:
the data is categorical, count-based, event-based, or text-like rather than continuous coordinates;
the task is forecasting only, with no interest in an interpretable equation of motion;
the series is dominated by long memory, abrupt regime switches, interventions, or hidden controls that cannot be represented in the state;
you only have a few disconnected snapshots rather than a trajectory or a field movie.
Pick the right starting route¶
Your data or question |
Start here |
Then read |
|---|---|---|
One tracked object or a few observed coordinates |
||
Noisy or coarsely-sampled recordings (localization error, low frame rate) |
||
Position-only data with inertia or oscillations |
||
Many interacting particles or agents |
Aligning active Brownian particles — generic pairs API, Multi-experiment ABP inference |
|
Spatial fields on a regular grid (experimental) |
Gray-Scott reaction-diffusion: SPDE inference, Discovering Toner–Tu hydrodynamics from agent-based flocking |
|
Large nonlinear parametric force models |
Choose the data container first¶
If your data is already in memory as an array, start with
from_arrays(). For a first fit, this is the
simplest and most predictable entry point.
If you have a tracked-particle table where particles appear and disappear
over time, use from_dataframe() (pandas, columns
addressed by name) or the lower-level from_columns().
If your trajectories are already on disk as CSV, Parquet, or HDF5, use
TrajectoryCollection.load — the file format is specified in
Trajectory file formats.
Pick an estimator family¶
SFI ships two first-class estimator families — route by data regime, not by habit:
Linear estimators —
compute_diffusion_constant(),infer_force_linear(),infer_diffusion_linear()— a closed-form projection: no initial guess, seconds even on large datasets, exact in the fine-sampling, low-noise limit.Parametric estimators —
infer_force(),infer_diffusion()— an iterative likelihood fit that models measurement noise and finite sampling explicitly, and accepts any differentiable model (including nonlinear ones). More compute, more robustness.
If your recordings carry measurement (localization) noise, or the frame interval is coarse compared to the dynamics, start directly with the parametric estimators — Measurement noise and coarse sampling is the guide. Otherwise the linear first pass below is the fastest start; the full trade-off table is in Choosing an estimator.
Default first pass (clean, well-sampled data)¶
load or build a
TrajectoryCollection;choose a small linear basis;
call
compute_diffusion_constant(),infer_force_linear(), andcompute_force_error();if the basis is large or if you seek a minimal model, run
sparsify_force().
On noisy or coarsely-sampled data the equivalent parametric pass is a
single call — inf.infer_force(B) — which profiles the diffusion
and measurement-noise levels automatically. When in doubt, run both:
agreement is itself a diagnostic, and disagreement measures the bias
the linear estimator cannot absorb.
Use diagnostics on real data¶
For experimental data, the main validation tool is the diagnostics suite:
from SFI.diagnostics import assess
inf.compute_force_error()
report = assess(inf, level="standard")
report.print_summary()
This is the fastest way to separate three common failure modes:
missing dynamics or a too-small basis: residual autocorrelation flags;
diffusion or noise mismatch: whitened residual standard deviation far from 1;
a biased model: the realised NMSE stays well above the predicted (sampling-noise) value (the MSE-consistency flag). A redundant basis is handled separately by sparse selection (
sparsify_force()).
On experimental data, the noise- and bias-type flags usually trace back to localization noise or coarse sampling — the cure is then the parametric estimators, not a bigger basis; see Measurement noise and coarse sampling.
What to do next¶
If diagnostics look clean and the coefficients are interpretable, keep the linear workflow and consider sparse selection.
If diagnostics flag noise or sampling effects, switch to the parametric estimators: Measurement noise and coarse sampling.
If inertia matters but you only observe positions, switch to
UnderdampedLangevinInference— see Underdamped systems.If your system contains interacting agents, move to Particle systems.
If your state is a field on a grid, move to Spatial field inference (SPDE) (experimental).
If you need a model that is nonlinear in its parameters, use the parametric estimators with a PSF — see Choosing an estimator.