Playbook — apply inference to a dataset¶

Note

Prerequisite: read AGENTS.md at the repository root for the canonical imports and the “do not re-implement” rule.

This playbook is the standard recipe for running SFI on a dataset (experimental or synthetic). Every step names the canonical class or function to use — do not re-implement any of them.

1. Decide overdamped vs underdamped¶

Data	Model	Class
Only positions observed	Overdamped	`OverdampedLangevinInference`
Only positions observed, but inertia matters (ballistic regime)	Underdamped	`UnderdampedLangevinInference`
Positions and velocities observed	Underdamped (pass `V`)	`UnderdampedLangevinInference`

Heuristic: if <Δx² / (2 dim · D · Δt)> is far from 1, the overdamped assumption is violated; switch to underdamped.

2. Load the data¶

Always use the trajectory containers. Do not build mask or increment arrays by hand.

import numpy as np
from SFI.trajectory import TrajectoryCollection

X = np.load("positions.npy")          # shape (T, d) or (T, N, d)
coll = TrajectoryCollection.from_arrays(X=X, dt=0.01)

Multi-experiment data: pass a list of arrays, or build individual TrajectoryDataset objects with from_arrays() and combine them. For masked data (missing frames) pass mask= — the collection handles all mask-aware arithmetic downstream. For CSV / Parquet / HDF5 input, use TrajectoryCollection.load (format spec: Trajectory file formats); for synthetic degradation (noise, downsampling, motion blur) see SFI.trajectory.degrade.

3. Choose a basis¶

Use the ready-made builders in SFI.bases. Only fall back to make_basis(func, ...) for truly custom functional forms.

You want…	Use
Polynomial in position up to order n	`monomials_up_to(order=n, dim=d, rank="vector")`
Polynomial in position and velocity	`monomials_up_to(order=n, dim=d, include_v=True, rank="vector")`
Coordinate-wise linear map	`linear_basis(dim=d)`
Constant (isotropic) diffusion	`identity_matrix_basis(dim=d)`
Symmetric-matrix diffusion	`symmetric_matrix_basis(dim=d)`
Pair interactions	`SFI.bases.pairs` builders
Spatial differential operators (SPDE)	`SFI.bases.spde` builders
Fully custom	`SFI.statefunc.make_basis(func, dim, rank, n_features, labels)`

Note

For ABP / active-matter models, compose SFI.bases.pairs primitives (heading_vector, pbc_displacement, wrap_angle). See examples/_gallery_utils/abp.py for a worked example.

Rule of thumb: start with monomials_up_to(order=3, dim=d, rank="vector") for the force, identity_matrix_basis(dim=d) for a constant-D fit, and sparsify downstream.

4. Run inference — pick the estimator family¶

Two first-class estimator families; route by data regime (full regime table: Running inference):

Linear estimators (fast, closed-form): compute_diffusion_constant() → infer_force_linear() → infer_diffusion_linear(). Use when measurement noise is negligible and data are well-sampled — exact in that limit, seconds even on large datasets.
Parametric estimators (robust; compute-intensive): infer_force() (RK4-integrated flow + Gauss–Newton) → infer_diffusion() (state-dependent D via local-precision NLL). Use when positions are noisy (\(y = x + \eta\)), when \(F\,\Delta t\) is not small, or when the model is nonlinear in its parameters.

Linear example:

import SFI
from SFI.bases import monomials_up_to, identity_matrix_basis

inf = SFI.OverdampedLangevinInference(coll)
inf.compute_diffusion_constant(method="auto")

B_force = monomials_up_to(order=3, dim=coll.d, rank="vector")
inf.infer_force_linear(B_force, M_mode="Strato")

B_diff = identity_matrix_basis(dim=coll.d)  # or symmetric_matrix_basis
inf.infer_diffusion_linear(B_diff)

inf.compute_force_error()
inf.compute_diffusion_error()

Parametric example:

# Build a *parametric* force F(x; θ) instead of a linear basis
from SFI.statefunc import make_psf

def force_fn(x, *, params):
    return -params["k"] * x

F = make_psf(force_fn, dim=coll.d, rank=1,
             params={"k": ()}, labels=["-k x"])

inf = SFI.OverdampedLangevinInference(coll)
inf.infer_force(F)
inf.infer_diffusion(B_diff)

5. (Optional) Sparsify¶

After infer_force_linear(), call sparsify_force() to identify which basis terms the data actually supports.

result = inf.sparsify_force(criterion="PASTIS", p=0.1)
# or: criterion="AIC" / "BIC"
# method kwargs: "beam" (default), "greedy", "stlsq", "lasso"

result.all_ic(verbose=True)   # summary table across all criteria

For held-out scoring and precision/recall against a known ground truth:

from SFI.inference.sparse import overlap_metrics, predictive_nmse

6. Report and validate¶

inf.print_report()                 # console summary
report = inf.report_dict()         # serialisable dict

# Compare to ground truth (simulations only)
inf.compare_to_exact(model_exact=proc)

# Bootstrapped trajectory from the inferred model
coll_boot, proc_boot = inf.simulate_bootstrapped_trajectory(key)

7. Save / load¶

from SFI.inference import save_model, load_model
save_model(inf.force_inferred, "F.npz")
# reloading needs a template supplying the basis / PSF structure
F_loaded = load_model("F.npz", template=inf.force_inferred)

# Full inference object
inf.save_results("run.json")

8. Plot¶

If you produce figures (gallery demo, paper figure, diagnostic), follow GALLERY_STYLE_GUIDE.md at the repository root: build dark-theme figures with dark_fig() and the SFI_COLORS palette; never use pure black. (Gallery demos call the apply_style() helper from examples/_gallery_utils/helpers.py instead, which is only importable inside the examples tree.)

from SFI.utils.plotting import dark_fig, SFI_COLORS
fig, ax = dark_fig()

9. Anti-patterns — do not do this¶

np.diff(X, axis=0) / dt for increments — use the collection’s increments; it is mask-aware.
Writing your own Euler-Maruyama loop — use SFI.langevin.OverdampedProcess.
Hand-rolling polynomial features — use SFI.bases.monomials_up_to().
Manually assembling Gram matrices or running lstsq — the inference engines already do this and track the covariance.
Thresholding coefficients by hand — use sparsify_force().

10. See also¶

Worked examples: Gallery (examples/gallery/ou_demo.py, lorenz_demo.py, experimental_workflow_demo.py are the closest templates).
User guides: Running inference, Trajectory data, Building bases.