Start here: is SFI right for my data?

SFI is built for learning interpretable stochastic dynamics from time-ordered observations of a continuous state. It is most useful when you care about the drift or force field, the diffusion or noise level, sparse term selection, or bootstrapped simulation of the inferred model.

Tip

New to SFI? The Getting started: end-to-end inference (Ornstein–Uhlenbeck) tutorial walks through a complete workflow end to end — simulate, infer the force and diffusion, select a minimal model, validate, and save — on a simple synthetic example. It is the fastest way to see the whole pipeline in action before routing your own data below.

When SFI is a good fit

SFI is usually a good fit if:

  • each observation is a continuous-valued state: positions, angles, velocities, concentrations, field values, abundances of large populations, or another trusted state variable;

  • stochastic fluctuations are part of the dynamics, not just a nuisance layered on top of a deterministic fit;

  • adjacent frames are close enough in time that they remain correlated;

  • you want an explicit dynamical law rather than a black-box forecaster.

When SFI is usually the wrong tool

SFI is usually not the right tool if:

  • the data is categorical, count-based, event-based, or text-like rather than continuous coordinates;

  • the task is forecasting only, with no interest in an interpretable equation of motion;

  • the series is dominated by long memory, abrupt regime switches, interventions, or hidden controls that cannot be represented in the state;

  • you only have a few disconnected snapshots rather than a trajectory or a field movie.

Pick the right starting route

Your data or question

Start here

Then read

One tracked object or a few observed coordinates

Experimental-data workflow template

Trajectory data, Running inference

Noisy or coarsely-sampled recordings (localization error, low frame rate)

Measurement noise and coarse sampling

Experimental-data workflow template

Position-only data with inertia or oscillations

Underdamped systems

Van der Pol oscillator — underdamped inference

Many interacting particles or agents

Particle systems

Aligning active Brownian particles — generic pairs API, Multi-experiment ABP inference

Spatial fields on a regular grid (experimental)

Spatial field inference (SPDE)

Gray-Scott reaction-diffusion: SPDE inference, Discovering Toner–Tu hydrodynamics from agent-based flocking

Large nonlinear parametric force models

Neural-network force field — Müller-Brown potential

Running inference

Choose the data container first

If your data is already in memory as an array, start with from_arrays(). For a first fit, this is the simplest and most predictable entry point.

If you have a tracked-particle table where particles appear and disappear over time, use from_dataframe() (pandas, columns addressed by name) or the lower-level from_columns().

If your trajectories are already on disk as CSV, Parquet, or HDF5, use TrajectoryCollection.load — the file format is specified in Trajectory file formats.

Pick an estimator family

SFI ships two first-class estimator families — route by data regime, not by habit:

  • Linear estimatorscompute_diffusion_constant(), infer_force_linear(), infer_diffusion_linear() — a closed-form projection: no initial guess, seconds even on large datasets, exact in the fine-sampling, low-noise limit.

  • Parametric estimatorsinfer_force(), infer_diffusion() — an iterative likelihood fit that models measurement noise and finite sampling explicitly, and accepts any differentiable model (including nonlinear ones). More compute, more robustness.

If your recordings carry measurement (localization) noise, or the frame interval is coarse compared to the dynamics, start directly with the parametric estimators — Measurement noise and coarse sampling is the guide. Otherwise the linear first pass below is the fastest start; the full trade-off table is in Choosing an estimator.

Default first pass (clean, well-sampled data)

  1. load or build a TrajectoryCollection;

  2. choose a small linear basis;

  3. call compute_diffusion_constant(), infer_force_linear(), and compute_force_error();

  4. if the basis is large or if you seek a minimal model, run sparsify_force().

On noisy or coarsely-sampled data the equivalent parametric pass is a single call — inf.infer_force(B) — which profiles the diffusion and measurement-noise levels automatically. When in doubt, run both: agreement is itself a diagnostic, and disagreement measures the bias the linear estimator cannot absorb.

Use diagnostics on real data

For experimental data, the main validation tool is the diagnostics suite:

from SFI.diagnostics import assess

inf.compute_force_error()
report = assess(inf, level="standard")
report.print_summary()

This is the fastest way to separate three common failure modes:

  • missing dynamics or a too-small basis: residual autocorrelation flags;

  • diffusion or noise mismatch: whitened residual standard deviation far from 1;

  • a biased model: the realised NMSE stays well above the predicted (sampling-noise) value (the MSE-consistency flag). A redundant basis is handled separately by sparse selection (sparsify_force()).

On experimental data, the noise- and bias-type flags usually trace back to localization noise or coarse sampling — the cure is then the parametric estimators, not a bigger basis; see Measurement noise and coarse sampling.

What to do next