Playbook — apply inference to a dataset

Note

Prerequisite: read AGENTS.md at the repository root for the canonical imports and the “do not re-implement” rule.

This playbook is the standard recipe for running SFI on a dataset (experimental or synthetic). Every step names the canonical class or function to use — do not re-implement any of them.

1. Decide overdamped vs underdamped

Data

Model

Class

Only positions observed

Overdamped

OverdampedLangevinInference

Only positions observed, but inertia matters (ballistic regime)

Underdamped

UnderdampedLangevinInference

Positions and velocities observed

Underdamped (pass V)

UnderdampedLangevinInference

Heuristic: if <Δx² / (2 dim · D · Δt)> is far from 1, the overdamped assumption is violated; switch to underdamped.

2. Load the data

Always use the trajectory containers. Do not build mask or increment arrays by hand.

import numpy as np
from SFI.trajectory import TrajectoryCollection

X = np.load("positions.npy")          # shape (T, d) or (T, N, d)
coll = TrajectoryCollection.from_arrays(X=X, dt=0.01)

Multi-experiment data: pass a list of arrays, or build individual TrajectoryDataset objects with from_arrays() and combine them. For masked data (missing frames) pass mask= — the collection handles all mask-aware arithmetic downstream. For CSV / Parquet / HDF5 input, use TrajectoryCollection.load (format spec: Trajectory file formats); for synthetic degradation (noise, downsampling, motion blur) see SFI.trajectory.degrade.

3. Choose a basis

Use the ready-made builders in SFI.bases. Only fall back to make_basis(func, ...) for truly custom functional forms.

You want…

Use

Polynomial in position up to order n

monomials_up_to(order=n, dim=d, rank="vector")

Polynomial in position and velocity

monomials_up_to(order=n, dim=d, include_v=True, rank="vector")

Coordinate-wise linear map

linear_basis(dim=d)

Constant (isotropic) diffusion

identity_matrix_basis(dim=d)

Symmetric-matrix diffusion

symmetric_matrix_basis(dim=d)

Pair interactions

SFI.bases.pairs builders

Spatial differential operators (SPDE)

SFI.bases.spde builders

Fully custom

SFI.statefunc.make_basis(func, dim, rank, n_features, labels)

Note

For ABP / active-matter models, compose SFI.bases.pairs primitives (heading_vector, pbc_displacement, wrap_angle). See examples/_gallery_utils/abp.py for a worked example.

Rule of thumb: start with monomials_up_to(order=3, dim=d, rank="vector") for the force, identity_matrix_basis(dim=d) for a constant-D fit, and sparsify downstream.

4. Run inference — pick the estimator family

Two first-class estimator families; route by data regime (full regime table: Running inference):

Linear estimators (fast, closed-form)

compute_diffusion_constant()infer_force_linear()infer_diffusion_linear(). Use when measurement noise is negligible and data are well-sampled — exact in that limit, seconds even on large datasets.

Parametric estimators (robust; compute-intensive)

infer_force() (RK4-integrated flow + Gauss–Newton) → infer_diffusion() (state-dependent D via local-precision NLL). Use when positions are noisy (\(y = x + \eta\)), when \(F\,\Delta t\) is not small, or when the model is nonlinear in its parameters.

Linear example:

import SFI
from SFI.bases import monomials_up_to, identity_matrix_basis

inf = SFI.OverdampedLangevinInference(coll)
inf.compute_diffusion_constant(method="auto")

B_force = monomials_up_to(order=3, dim=coll.d, rank="vector")
inf.infer_force_linear(B_force, M_mode="Strato")

B_diff = identity_matrix_basis(dim=coll.d)  # or symmetric_matrix_basis
inf.infer_diffusion_linear(B_diff)

inf.compute_force_error()
inf.compute_diffusion_error()

Parametric example:

# Build a *parametric* force F(x; θ) instead of a linear basis
from SFI.statefunc import make_psf

def force_fn(x, *, params):
    return -params["k"] * x

F = make_psf(force_fn, dim=coll.d, rank=1,
             params={"k": ()}, labels=["-k x"])

inf = SFI.OverdampedLangevinInference(coll)
inf.infer_force(F)
inf.infer_diffusion(B_diff)

5. (Optional) Sparsify

After infer_force_linear(), call sparsify_force() to identify which basis terms the data actually supports.

result = inf.sparsify_force(criterion="PASTIS", p=0.1)
# or: criterion="AIC" / "BIC"
# method kwargs: "beam" (default), "greedy", "stlsq", "lasso"

result.all_ic(verbose=True)   # summary table across all criteria

For held-out scoring and precision/recall against a known ground truth:

from SFI.inference.sparse import overlap_metrics, predictive_nmse

6. Report and validate

inf.print_report()                 # console summary
report = inf.report_dict()         # serialisable dict

# Compare to ground truth (simulations only)
inf.compare_to_exact(model_exact=proc)

# Bootstrapped trajectory from the inferred model
coll_boot, proc_boot = inf.simulate_bootstrapped_trajectory(key)

7. Save / load

from SFI.inference import save_model, load_model
save_model(inf.force_inferred, "F.npz")
# reloading needs a template supplying the basis / PSF structure
F_loaded = load_model("F.npz", template=inf.force_inferred)

# Full inference object
inf.save_results("run.json")

8. Plot

If you produce figures (gallery demo, paper figure, diagnostic), follow GALLERY_STYLE_GUIDE.md at the repository root: build dark-theme figures with dark_fig() and the SFI_COLORS palette; never use pure black. (Gallery demos call the apply_style() helper from examples/_gallery_utils/helpers.py instead, which is only importable inside the examples tree.)

from SFI.utils.plotting import dark_fig, SFI_COLORS
fig, ax = dark_fig()

9. Anti-patterns — do not do this

  • np.diff(X, axis=0) / dt for increments — use the collection’s increments; it is mask-aware.

  • Writing your own Euler-Maruyama loop — use SFI.langevin.OverdampedProcess.

  • Hand-rolling polynomial features — use SFI.bases.monomials_up_to().

  • Manually assembling Gram matrices or running lstsq — the inference engines already do this and track the covariance.

  • Thresholding coefficients by hand — use sparsify_force().

10. See also