Playbook — apply inference to a dataset¶
Note
Prerequisite: read AGENTS.md at the repository root for the
canonical imports and the “do not re-implement” rule.
This playbook is the standard recipe for running SFI on a dataset (experimental or synthetic). Every step names the canonical class or function to use — do not re-implement any of them.
1. Decide overdamped vs underdamped¶
Data |
Model |
Class |
|---|---|---|
Only positions observed |
Overdamped |
|
Only positions observed, but inertia matters (ballistic regime) |
Underdamped |
|
Positions and velocities observed |
Underdamped (pass |
|
Heuristic: if <Δx² / (2 dim · D · Δt)> is far from 1, the overdamped
assumption is violated; switch to underdamped.
2. Load the data¶
Always use the trajectory containers. Do not build mask or increment arrays by hand.
import numpy as np
from SFI.trajectory import TrajectoryCollection
X = np.load("positions.npy") # shape (T, d) or (T, N, d)
coll = TrajectoryCollection.from_arrays(X=X, dt=0.01)
Multi-experiment data: pass a list of arrays, or build individual
TrajectoryDataset objects with from_arrays() and
combine them. For masked data (missing frames) pass mask= — the
collection handles all mask-aware arithmetic downstream. For CSV /
Parquet / HDF5 input, use TrajectoryCollection.load (format spec:
Trajectory file formats); for
synthetic degradation (noise, downsampling, motion blur) see
SFI.trajectory.degrade.
3. Choose a basis¶
Use the ready-made builders in SFI.bases. Only fall back to
make_basis(func, ...) for truly custom functional forms.
You want… |
Use |
|---|---|
Polynomial in position up to order n |
|
Polynomial in position and velocity |
|
Coordinate-wise linear map |
|
Constant (isotropic) diffusion |
|
Symmetric-matrix diffusion |
|
Pair interactions |
|
Spatial differential operators (SPDE) |
|
Fully custom |
|
Note
For ABP / active-matter models, compose SFI.bases.pairs
primitives (heading_vector, pbc_displacement, wrap_angle).
See examples/_gallery_utils/abp.py for a worked example.
Rule of thumb: start with monomials_up_to(order=3, dim=d,
rank="vector") for the force, identity_matrix_basis(dim=d) for a
constant-D fit, and sparsify downstream.
4. Run inference — pick the estimator family¶
Two first-class estimator families; route by data regime (full regime table: Running inference):
- Linear estimators (fast, closed-form)
compute_diffusion_constant()→infer_force_linear()→infer_diffusion_linear(). Use when measurement noise is negligible and data are well-sampled — exact in that limit, seconds even on large datasets.- Parametric estimators (robust; compute-intensive)
infer_force()(RK4-integrated flow + Gauss–Newton) →infer_diffusion()(state-dependent D via local-precision NLL). Use when positions are noisy (\(y = x + \eta\)), when \(F\,\Delta t\) is not small, or when the model is nonlinear in its parameters.
Linear example:
import SFI
from SFI.bases import monomials_up_to, identity_matrix_basis
inf = SFI.OverdampedLangevinInference(coll)
inf.compute_diffusion_constant(method="auto")
B_force = monomials_up_to(order=3, dim=coll.d, rank="vector")
inf.infer_force_linear(B_force, M_mode="Strato")
B_diff = identity_matrix_basis(dim=coll.d) # or symmetric_matrix_basis
inf.infer_diffusion_linear(B_diff)
inf.compute_force_error()
inf.compute_diffusion_error()
Parametric example:
# Build a *parametric* force F(x; θ) instead of a linear basis
from SFI.statefunc import make_psf
def force_fn(x, *, params):
return -params["k"] * x
F = make_psf(force_fn, dim=coll.d, rank=1,
params={"k": ()}, labels=["-k x"])
inf = SFI.OverdampedLangevinInference(coll)
inf.infer_force(F)
inf.infer_diffusion(B_diff)
5. (Optional) Sparsify¶
After infer_force_linear(), call sparsify_force() to identify which
basis terms the data actually supports.
result = inf.sparsify_force(criterion="PASTIS", p=0.1)
# or: criterion="AIC" / "BIC"
# method kwargs: "beam" (default), "greedy", "stlsq", "lasso"
result.all_ic(verbose=True) # summary table across all criteria
For held-out scoring and precision/recall against a known ground truth:
from SFI.inference.sparse import overlap_metrics, predictive_nmse
6. Report and validate¶
inf.print_report() # console summary
report = inf.report_dict() # serialisable dict
# Compare to ground truth (simulations only)
inf.compare_to_exact(model_exact=proc)
# Bootstrapped trajectory from the inferred model
coll_boot, proc_boot = inf.simulate_bootstrapped_trajectory(key)
7. Save / load¶
from SFI.inference import save_model, load_model
save_model(inf.force_inferred, "F.npz")
# reloading needs a template supplying the basis / PSF structure
F_loaded = load_model("F.npz", template=inf.force_inferred)
# Full inference object
inf.save_results("run.json")
8. Plot¶
If you produce figures (gallery demo, paper figure, diagnostic), follow
GALLERY_STYLE_GUIDE.md at the repository root: build dark-theme
figures with dark_fig() and the SFI_COLORS palette; never use
pure black. (Gallery demos call the apply_style() helper from
examples/_gallery_utils/helpers.py instead, which is only importable
inside the examples tree.)
from SFI.utils.plotting import dark_fig, SFI_COLORS
fig, ax = dark_fig()
9. Anti-patterns — do not do this¶
np.diff(X, axis=0) / dtfor increments — use the collection’s increments; it is mask-aware.Writing your own Euler-Maruyama loop — use
SFI.langevin.OverdampedProcess.Hand-rolling polynomial features — use
SFI.bases.monomials_up_to().Manually assembling Gram matrices or running lstsq — the inference engines already do this and track the covariance.
Thresholding coefficients by hand — use
sparsify_force().
10. See also¶
Worked examples: Gallery (
examples/gallery/ou_demo.py,lorenz_demo.py,experimental_workflow_demo.pyare the closest templates).User guides: Running inference, Trajectory data, Building bases.