SFI.inference.underdamped module¶
- class SFI.inference.underdamped.UnderdampedLangevinInference(data, *, max_memory_gb=1.0, **kwargs)[source]¶
Bases:
BaseLangevinInferenceStochastic Force Inference concrete class for underdamped systems (velocities unobserved).
- Core equations (Ito convention):
dx/dt = v dv/dt = F(x, v) + sqrt(2 D(x, v)) dW_t
Only x(t) is observed; v(t) is reconstructed from finite differences (“secant velocities”).
[Dynamical equations] Underdamped Langevin SDE
\[\frac{\mathrm{d}x}{\mathrm{d}t} = v, \qquad \frac{\mathrm{d}v}{\mathrm{d}t} = F(x,v) + \sqrt{2\,D(x,v)}\;\mathrm{d}W_t\]Only positions \(x(t)\) are observed; velocities are reconstructed from finite differences.
- coeff_block(block, *, field='force')¶
Return the coefficient (and covariance) slice for a basis sub-block.
Compound bases (e.g. a multi-kernel or time-Fourier library) pack several conceptual blocks into one flat coefficient vector. This returns the slice for one block without hand-computed offsets.
- Parameters:
block –
(start, stop)indices, aslice, anint, or aBasis(located by matching its labels as a contiguous run of the fitted basis labels).field ({"force", "diffusion"})
- Returns:
(coeffs, cov) –
coeffsis the 1-D slice;covis the matching covariance block (orNoneif no covariance is available).- Return type:
tuple
- compare_params_to_exact(theta_true, *, psf=None)¶
Compare inferred parametric coefficients to known ground truth.
For a model fitted with a parametric family, returns a per-parameter dict of absolute and relative error.
theta_truemay be a flat array (compared elementwise toforce_coefficients_full) or a{name: value}dict (unflattened from the fitted coefficients viapsf.unflatten_params, falling back toself.force_psf).- Returns:
{name: {"true", "inferred", "abs_error", "rel_error"}}; also stored asself.parameter_comparison.- Return type:
dict
- compare_to_exact(*, model_exact=None, data_exact=None, force_exact=None, diffusion_exact=None, maxpoints=1000)¶
Compare inferred vs exact using dt-weighted time means via the integrate() API.
This function evaluates the inferred force/diffusion against “exact” references on a (possibly exact/synthetic) dataset. It updates:
self.MSE_force / self.NMSE_force self.MSE_diffusion / self.NMSE_diffusion
Inputs: exact references¶
You can provide exact references in two ways:
Preferred: model_exact A model object (from SFI.langevin submodule) exposing:
model_exact.force_sf : exact force/drift (SF/StateExpr-like)
- model_exact.diffusion_sfexact diffusion (SF/StateExpr-like)
OR a constant (float or (d,d) matrix) via
model_exact.D
Explicit: force_exact, diffusion_exact - force_exact: SF/StateExpr-like callable returning (N, d) - diffusion_exact:
callable returning (N, d, d), OR
float meaning σ·I, OR
(d,d) matrix constant diffusion.
These are used if model_exact is not provided. If model_exact is provided, its members take precedence unless they are missing, in which case the explicit arguments can be used as fallback.
Velocity provisioning (underdamped)¶
If an evaluated expression advertises needs_v=True, this routine supplies:
v := dX/dt (secant velocity from the data stream)
i.e. it uses velocity(“dX”, “dt”) as the v=… keyword argument. This works for both exact and inferred expressions and keeps underdamped comparisons possible even when the dataset only stores positions.
Metrics¶
- Force:
e = Fe - Fh MSE_force = < e^T A_inv e > NMSE_force = MSE_force / < Fh^T A_inv Fh >
- Diffusion:
E = De - Dh MSE_diffusion = < tr(A_inv E A_inv E) >
[Error analysis] Normalized MSE metrics (force & diffusion)
\[\text{NMSE}_F = \frac{\langle (F_{\text{exact}} - \hat F)^\top A^{-1} (F_{\text{exact}} - \hat F) \rangle} {\langle \hat F^\top A^{-1} \hat F \rangle}\]\[\text{NMSE}_D = \frac{\langle \operatorname{tr}(A^{-1} E\, A^{-1} E) \rangle} {\langle \operatorname{tr}(A^{-1} \hat D\, A^{-1} \hat D) \rangle}\]- where \(E = D_{\text{exact}} - \hat D\).
NMSE_diffusion = MSE_diffusion / < tr(A_inv Dh A_inv Dh) >
where A_inv is self.A_inv (typically (2 D̄)^{-1} from the inferred constant diffusion normalization).
Subsampling¶
Uses a simple subsampling heuristic so that the total number of evaluated points is ~<= maxpoints, accounting for the maximum number of particles.
Requirements¶
self.A_inv must exist (run compute_diffusion_constant() or otherwise set A_inv).
The dataset must provide streams X, dt, and if any evaluated expr needs v: dX as well.
- Parameters:
maxpoints (int)
- Return type:
None
- comparison_scatter(*, model_exact=None, field='force', data=None, ax=None, maxpoints=2000, **plot_kw)¶
Scatter inferred-vs-exact force (or diffusion) along the trajectory.
Evaluates both fields on the data with
force_comparison_arrays()/diffusion_comparison_arrays()and renders them withSFI.utils.plotting.comparison_scatter()(identity line + Pearsonr+ MSE). Replaces hand-rolled exact-vs-inferred scatters in demos.- Parameters:
model_exact – Object exposing
force_sf/diffusion_sf/D.field ({"force", "diffusion"}) – Which field to compare.
data – Collection to evaluate on (default: training data).
ax – Target axes (default: current axes).
maxpoints (int) – Approximate number of points to evaluate.
**plot_kw – Forwarded to
SFI.utils.plotting.comparison_scatter().
- Return type:
matplotlib.axes.Axes
- compute_diffusion_constant(method='auto')[source]¶
Underdamped constant diffusion inference (ULI estimators).
Estimate measurement noise Λ with method “Lambda”.
If method == ‘auto’: choose ‘noisy’ if tr(Λ)>0 else ‘WeakNoise’.
Average chosen instantaneous estimator to obtain D̄.
Set A = 2 D̄ and derive A_inv, sqrtA, sqrtA_inv.
- Parameters:
method (str)
- Return type:
None
- compute_diffusion_error()¶
Estimate sampling error for diffusion inference.
Mirrors
compute_force_error()for the diffusion field. Uses the diffusion Gram matrix (normal matrix) and its inverse.For linear diffusion inference the moments covariance is proportional to the Gram matrix, giving
Cov(θ_D) = cov_factor * G_D⁻¹.Note
This error estimate is approximate. The diffusion inference is more complex than the force inference: diffusion coefficients are inferred from force residuals, a positive-definiteness constraint applies, and the simple covariance formula
Cov(θ_D) = cov_factor * G_D⁻¹may not capture all sources of uncertainty. Treat the result as a rough guide rather than a rigorous confidence interval.Notes
- self.diffusion_coefficients_covariancejnp.ndarray
Covariance matrix of the diffusion coefficients.
- self.diffusion_coefficients_stderrjnp.ndarray
Standard error for each diffusion coefficient.
- self.diffusion_informationfloat
Estimated information content of the inferred diffusion field.
- self.diffusion_predicted_MSEfloat
Predicted normalized mean squared error.
- compute_force_error()¶
Estimate sampling error for force inference.
[Error analysis] Force coefficient covariance & predicted error
\[\operatorname{Cov}(C) = G^{-1}, \qquad \mathbb{E}\!\left[\langle \delta F^\top A^{-1} \delta F \rangle\right] = \operatorname{Tr}\!\left(G\,\operatorname{Cov}(C)\right), \qquad I_F = \tfrac{1}{2}\,C^\top M, \qquad \text{NMSE}_{F,\text{pred}} = \frac{\operatorname{Tr}(G\cdot\operatorname{Cov}(C))}{C^\top M} = \frac{\operatorname{Tr}(G\cdot\operatorname{Cov}(C))}{2 I_F}\]Assumes the sampling error dominates; measurement noise and discretization biases are not addressed.
This method evaluates the covariance of the inferred force coefficients, the standard error, and computes the predicted normalized mean squared error (MSE) of the inferred force field. This analysis assumes that the sampling error dominates, and measurement noise or discretization biases are not explicitly addressed. It is common to OLI and ULI (by construction of the normal matrix G).
Notes
self.force_coefficients_covariance (jnp.ndarray): Covariance matrix of the force coefficients. self.force_coefficients_stderr (jnp.ndarray): Standard error for each force coefficient. self.force_information (float): Estimated information content of the inferred force field. self.force_predicted_MSE (float): Predicted normalized mean squared error of the inferred force field.
- diagnose(*, level='standard', **kwargs)¶
Run the consistency-check suite from
SFI.diagnostics.Convenience wrapper for
SFI.diagnostics.assess(). See its docstring for the availablelevelpresets.- Parameters:
level (str)
- diffusion_comparison_arrays(*, model_exact=None, diffusion_exact=None, data=None, maxpoints=2000)¶
Return
(D_exact, D_inferred)evaluated along the trajectory.Like
force_comparison_arrays()but for the diffusion field; a constant exact/inferred diffusion is broadcast to(n_points, d, d).- Parameters:
maxpoints (int)
- force_comparison_arrays(*, model_exact=None, force_exact=None, data=None, maxpoints=2000)¶
Return
(F_exact, F_inferred)evaluated along the trajectory.Evaluates the exact and inferred force on the (subsampled, masked) trajectory points, supplying finite-difference velocities for underdamped fields. Feeds
comparison_scatter(); also handy for custom diagnostics.- Parameters:
model_exact – Object exposing
force_sf(e.g. anOverdampedProcess).force_exact – Explicit callable exact force (overrides
model_exact).data – Collection to evaluate on (default: the training data).
maxpoints (int) – Approximate number of points to evaluate.
- Returns:
(F_exact, F_inferred)
- Return type:
tuple of ndarray, shape
(n_points, d)
- holdout_score(data, *, require_error=False)¶
Held-out NMSE of the fitted force on an independent collection.
A side feature for data-abundant scenarios: SFI estimates its own accuracy from the training data (
force_predicted_MSE) and validates fits through the diagnostics suite, neither of which costs any data. Reach for an explicit train/test split (TrajectoryCollection.split_time) only when data is plentiful, or to confirm a suspected bias floor: arationear 1 means the fit is sampling-limited, a ratio≫ 1means a bias floor (often measurement noise — see the noise-and-sampling guide).The score is the residual-based normalised mean-square error of
force_inferredondata, with the diffusion noise floor subtracted (a bias detector, not a precision instrument: its resolution is set by the χ² fluctuations of the residuals).- Parameters:
data (TrajectoryCollection) – Independent test data (e.g. the second half of
coll.split_time(0.8)).require_error (bool) – If True, run
compute_force_error()first when the predicted error is missing, soratiois always defined.
- Returns:
{"holdout_NMSE", "predicted_NMSE", "ratio", "excess_z", "n_obs"}. Also stored asself.force_holdout_NMSE.- Return type:
dict
Notes
Bases that read time-dependent extras are not supported on the held-out path (the residual builders pass extras unsliced).
- infer_diffusion(basis=None, *, theta_D0=None, integrator='rk4', n_substeps=1, maxiter=100)[source]¶
Infer state- and velocity-dependent diffusion D(x, v).
Requires a prior parametric
infer_force()call. Holds the fitted force fixed and minimises the pentadiagonal windowed conditional NLL over the diffusion parameters, withD(x, v; θ_D)evaluated at the shooting velocity.[Inference] State-dependent diffusion inference (underdamped)
With the force \(\hat F(x,v)\) held fixed, the state-dependent diffusion \(D(x,v;\theta_D)\) is optimised by minimising the windowed conditional NLL on pentadiagonal (bandwidth-2) covariance windows; \(\Lambda\) from the force inference is held fixed.
- Parameters:
basis (Basis (rank 2), PSF, or None) – Diffusion model. A rank-2
Basisgives the linear parameterisationD = Σ_j θ_j d_j; aPSF(optionallyneeds_v=True) is used directly. WhenNone(default),symmetric_matrix_basis(d)— all constant symmetricd×ddiffusion matrices.theta_D0 (dict, array, or None) – Initial diffusion parameters (default zeros).
integrator ({"rk4", "euler"}) – Flow predictor (default
"rk4", matchinginfer_force()).n_substeps (int) – Integrator micro-steps per Δt (default 1).
maxiter (int) – L-BFGS maximum iterations (default 100).
- Return type:
None
Notes
Sets
diffusion_inferred,diffusion_coefficients(anddiffusion_basiswhenbasisis aBasis), plus metadata.See also
- Parametric windowed estimators — algorithm and parameters
Full algorithm description.
- infer_diffusion_linear(basis, *, M_mode='noisy', G_mode='rectangle')[source]¶
Fit D(x, v) as a linear combination of basis functions.
- Parameters:
basis (Basis)
M_mode (str)
G_mode (str)
- Return type:
None
- infer_force(F, theta0=None, *, D=None, Lambda=None, integrator='rk4', n_substeps=1, inner='auto', eiv='auto', max_outer=5, inner_maxiter=80)[source]¶
Infer force F(x,v) with the minimal parametric estimator.
Built on
SFI.inference.parametric_core: the unobserved velocity is resolved by shooting through a single RK4 phase-space step, the 3-point residual covariance is pentadiagonal (bandwidth 2), and the parameters are found by direct Gauss–Newton (linear-in-θBasis, with the skip-trick errors-in-variables instrument) or frozen-precision L-BFGS (nonlinear-in-θPSF).(D, Λ)are profiled natively: moment-estimator init, then one windowed conditional-NLL refinement at the fitted θ.[Inference] Parametric windowed force inference (underdamped)
The phase-space dynamics \(\dot{x}=v,\;\mathrm{d}v = F(x,v)\mathrm{d}t + \sqrt{2D}\,\mathrm{d}W\) are factored into deterministic flow + noise. Three-point shooting residuals follow a pentadiagonal Gaussian whose local precision weights the Gauss–Newton normal equations; the process noise enters at \(\Delta t^3\). Under measurement noise the left factor is the η-clean skip instrument built from the lagged clean position pair (
eiv=True) — consistent where the naive MLE is velocity-EIV-biased.- Parameters:
F (PSF (
needs_v=True) or Basis) – Parametric force model. ABasisis converted to a PSF internally (coefficients initialised to zero) and PASTIS sparsification is enabled; it runs the fast direct-GN path.theta0 (dict, array, or None) – Initial force parameters (default: zeros).
D (array (d, d), optional) – Fixed diffusion matrix. If both
DandLambdaare given, noise profiling is skipped entirely (fast path).Lambda (array
(d, d), optional) – Fixed measurement-noise covariance.integrator ({"rk4", "euler"}) – Flow predictor (default
"rk4", a single 4th-order step). A single"euler"step cannot carry the force into the position update, so the skip-trick is auto-disabled (with a warning) forintegrator="euler", n_substeps=1.n_substeps (int) – Integrator micro-steps per observation interval (default 1 — the single-step minimal estimator).
inner ({"auto", "gn", "lbfgs"}) – Inner solver.
"auto"→ direct Gauss–Newton for a linearBasis, L-BFGS for aPSF.eiv ({"auto", True, False, float}) – Measurement-noise errors-in-variables instrument.
"auto"(default) →Truefor all models (interacting models use the same N-body flow for the instrument as for the residual);Trueforces the η-clean skip instrument;Falseis the plain MLE; a float in[0, 1]blends. GN path only.max_outer (int) – Outer Gauss–Newton / IRLS iterations (default 5).
inner_maxiter (int) – Inner L-BFGS iterations per outer step on the PSF path (default 80; raise for large nonlinear families, e.g. NNs).
- Return type:
None
Notes
Sets standard
force_*attributes:force_psf,force_G,force_G_pinv,force_coefficients_full,force_coefficients,force_support,force_moments,force_inferred,diffusion_average,A,A_inv,Lambda.When
Fis aBasis, additionally setsforce_basis,force_G_full, andforce_scorerso thatsparsify_force()can be called afterwards.See also
- Parametric windowed estimators — concepts
Mathematical foundations.
- Parametric windowed estimators — algorithm and parameters
Detailed algorithm description.
- infer_force_linear(basis, *, preset='auto', M_mode=None, G_mode=None, diffusion_method=None)[source]¶
Infer F(x, v) as a linear combination of basis functions.
The estimator is configured by a
presetthat consolidates the(M_mode, G_mode, diffusion_method)surface into two broadly-validated choices, with anautoswitch keyed on measurement noise. Power users can still pass any of the three modes explicitly to override the preset.- Parameters:
preset ({'auto', 'robust', 'clean', 'legacy-clean-v1.0', 'legacy-noisy-v1.0'}, default 'auto') –
'robust'—symmetric/trapeze/noisy. The broadly applicable default: flat across measurement noise within its feasible band and never catastrophic.'clean'—symmetric/trapeze/WeakNoise. A sharper refinement for verified noise-free data (helps mainly in the coarse-dt, low-noise corner); fragile under measurement noise.'auto'— pick'robust'when measurement noise is detected and'clean'otherwise, on the sign of the localization-noise estimateLambda_trace(>0-> noise present).Lambda_traceis reused from a priorcompute_diffusion_constant()call or estimated inline. Validated on thevdp_uli_10xbenchmark: the zero-crossing tracks thenoisy``<->``WeakNoisetransition with no harmful misroutes.'legacy-clean-v1.0'—early/rectangle/MSD. Reproduces the published SFI v1.0 clean-data convention (explicit only; never selected by'auto').'legacy-noisy-v1.0'—symmetric/rectangle/noisy. Reproduces the SFI v1.0 noisy-data convention (explicit only).
basis (Basis) – Rank-1 basis producing per-particle vectors. Must accept v=… keyword in evaluation.
M_mode (str, optional) – Override the preset’s kinematics / moments convention: ‘symmetric’ | ‘early’ | ‘anticipated’.
Noneuses the preset.G_mode (str, optional) – Override the preset’s Gram construction mode: ‘rectangle’ | ‘trapeze’ | ‘shift’ | ‘doubleshift’ (each returned transposed).
Noneuses the preset.diffusion_method (str, optional) – Override the preset’s instantaneous diffusion estimator used in the ULI correction term: ‘noisy’ | ‘WeakNoise’ | ‘MSD’.
Noneuses the preset (and re-enables theautomeasurement-noise switch).
- Return type:
None
- static load_results(path)¶
Reload a dict previously saved by
save_results().See
SFI.inference.serialization.load_results()for details.- Return type:
dict
- predict_time_profile(basis_block, t, *, field='force', x=None)¶
Evaluate a (time-dependent) basis block’s coefficient profile at
t.Contracts the fitted coefficients of
basis_blockwith the basis’s own evaluation at timest(via the reservedtimeextra), returning the time profile — e.g.-k(t)for thexblock of a time-Fourier trap. Avoids re-deriving the design matrix by hand.- Parameters:
field (str)
- print_report()¶
Print a summary report of the inference results.
Provides insights into the inferred diffusion and force fields, along with error metrics such as sampling error, trajectory length, discretization bias, and measurement noise.
- report_dict()¶
Return a structured summary of inference results as a dictionary.
This is the machine-readable counterpart of
print_report(). All values are plain Python scalars or numpy arrays (no JAX arrays).- Returns:
Keys include
"diffusion_average","Lambda","force_information","force_predicted_MSE","NMSE_force","NMSE_diffusion", and others when available. Missing quantities are omitted.- Return type:
dict
- save_results(path)¶
Save
report_dict()to<path>.npz+<path>.json.See
SFI.inference.serialization.save_results()for details.- Return type:
Path
- simulate_bootstrapped_trajectory(key, oversampling=1, simulate=True, dataset=0)[source]¶
Simulate an underdamped Langevin trajectory using the inferred force and diffusion.
- Parameters:
key – JAX random key.
oversampling (int) – number of integration substeps between saved frames.
simulate (bool) – if True, runs a simulation initialized from the first row of data; if False, returns an uninitialized process object.
dataset (int) – which experiment of a pooled fit to reproduce. The inferred model is collapsed to this condition via
specialize()(per-dataset parameters folded atdataset); the resulting process is a standalone single-condition model that does not readdataset_index. Defaults to 0.
- Returns:
(TrajectoryCollection, UnderdampedProcess) Else: UnderdampedProcess
- Return type:
If simulate=True
- sparsify_force(*, criterion='PASTIS', p=0.05, method='beam', max_k=None, **strategy_kwargs)¶
Sparsify the inferred force by selecting a subset of basis functions.
Builds a Pareto front of sparse models using the chosen
method, then selects the model that maximises the given informationcriterion.- Parameters:
criterion (
"PASTIS"|"AIC"|"BIC"|"EBIC"|"SIC", default"PASTIS") – Information criterion for model selection.p (float, default 0.05) – Prior-scale parameter \(p_0\) for the PASTIS penalty.
method (str, default
"beam") –Search strategy. One of:
"beam"— bidirectional beam search (PASTIS original). Extra kwargs:beam_width(int, default 3),aic_patience(int, default 2)."greedy"— forward stepwise selection. Extra kwargs:direction("forward"|"backward"|"both", default"forward")."stlsq"— Sequential Thresholded Least Squares (SINDy-style). Extra kwargs:threshold(float or None),mode("relative"|"absolute"),n_thresholds(int)."lasso"— \(\ell_1\)-penalised coordinate descent. Extra kwargs:alpha(float or None),n_alphas(int)."hillclimb"— stochastic hill-climbing (Gerardos & Ronceray, 2025). Extra kwargs:ic,patience(int),seed(int or None).
max_k (int or None) – Maximum model size. Defaults to the full basis size.
**strategy_kwargs – Passed to the strategy constructor.
- Returns:
The full Pareto-front result, also stored as
self.force_sparsity_result.- Return type:
- summary(field='force')¶
Return a formatted coefficient table for the inferred model.
- Parameters:
field (
"force"or"diffusion") – Which inferred field to summarize.- Returns:
Multi-line table ready for
print().- Return type:
str