SFI.inference.underdamped module¶

class SFI.inference.underdamped.UnderdampedLangevinInference(data, *, max_memory_gb=1.0, **kwargs)[source]¶

Bases: BaseLangevinInference

Stochastic Force Inference concrete class for underdamped systems (velocities unobserved).

Core equations (Ito convention):: dx/dt = v dv/dt = F(x, v) + sqrt(2 D(x, v)) dW_t

Only x(t) is observed; v(t) is reconstructed from finite differences (“secant velocities”).

[Dynamical equations] Underdamped Langevin SDE

\[\frac{\mathrm{d}x}{\mathrm{d}t} = v, \qquad \frac{\mathrm{d}v}{\mathrm{d}t} = F(x,v) + \sqrt{2\,D(x,v)}\;\mathrm{d}W_t\]

Only positions \(x(t)\) are observed; velocities are reconstructed from finite differences.

coeff_block(block, *, field='force')¶

Return the coefficient (and covariance) slice for a basis sub-block.

Compound bases (e.g. a multi-kernel or time-Fourier library) pack several conceptual blocks into one flat coefficient vector. This returns the slice for one block without hand-computed offsets.

Parameters:

block – (start, stop) indices, a slice, an int, or a Basis (located by matching its labels as a contiguous run of the fitted basis labels).
field ({"force", "diffusion"})

Returns:

(coeffs, cov) – coeffs is the 1-D slice; cov is the matching covariance block (or None if no covariance is available).

Return type:

tuple

compare_params_to_exact(theta_true, *, psf=None)¶

Compare inferred parametric coefficients to known ground truth.

For a model fitted with a parametric family, returns a per-parameter dict of absolute and relative error. theta_true may be a flat array (compared elementwise to force_coefficients_full) or a {name: value} dict (unflattened from the fitted coefficients via psf.unflatten_params, falling back to self.force_psf).

Returns:: {name: {"true", "inferred", "abs_error", "rel_error"}}; also stored as self.parameter_comparison.
Return type:: dict

compare_to_exact(*, model_exact=None, data_exact=None, force_exact=None, diffusion_exact=None, maxpoints=1000)¶

Compare inferred vs exact using dt-weighted time means via the integrate() API.

This function evaluates the inferred force/diffusion against “exact” references on a (possibly exact/synthetic) dataset. It updates:

self.MSE_force / self.NMSE_force self.MSE_diffusion / self.NMSE_diffusion

Inputs: exact references¶

You can provide exact references in two ways:

Preferred: model_exact A model object (from SFI.langevin submodule) exposing:
- model_exact.force_sf : exact force/drift (SF/StateExpr-like)
- model_exact.diffusion_sfexact diffusion (SF/StateExpr-like)
  OR a constant (float or (d,d) matrix) via model_exact.D
Explicit: force_exact, diffusion_exact - force_exact: SF/StateExpr-like callable returning (N, d) - diffusion_exact:
- callable returning (N, d, d), OR
- float meaning σ·I, OR
- (d,d) matrix constant diffusion.
These are used if model_exact is not provided. If model_exact is provided, its members take precedence unless they are missing, in which case the explicit arguments can be used as fallback.

Velocity provisioning (underdamped)¶

If an evaluated expression advertises needs_v=True, this routine supplies:

v := dX/dt (secant velocity from the data stream)

i.e. it uses velocity(“dX”, “dt”) as the v=… keyword argument. This works for both exact and inferred expressions and keeps underdamped comparisons possible even when the dataset only stores positions.

Metrics¶

Force:: e = Fe - Fh MSE_force = < e^T A_inv e > NMSE_force = MSE_force / < Fh^T A_inv Fh >
Diffusion:: E = De - Dh MSE_diffusion = < tr(A_inv E A_inv E) >

[Error analysis] Normalized MSE metrics (force & diffusion)

\[\text{NMSE}_F = \frac{\langle (F_{\text{exact}} - \hat F)^\top A^{-1} (F_{\text{exact}} - \hat F) \rangle} {\langle \hat F^\top A^{-1} \hat F \rangle}\]

\[\text{NMSE}_D = \frac{\langle \operatorname{tr}(A^{-1} E\, A^{-1} E) \rangle} {\langle \operatorname{tr}(A^{-1} \hat D\, A^{-1} \hat D) \rangle}\]

where \(E = D_{\text{exact}} - \hat D\).: NMSE_diffusion = MSE_diffusion / < tr(A_inv Dh A_inv Dh) >

where A_inv is self.A_inv (typically (2 D̄)^{-1} from the inferred constant diffusion normalization).

Subsampling¶

Uses a simple subsampling heuristic so that the total number of evaluated points is ~<= maxpoints, accounting for the maximum number of particles.

Requirements¶

self.A_inv must exist (run compute_diffusion_constant() or otherwise set A_inv).
The dataset must provide streams X, dt, and if any evaluated expr needs v: dX as well.

Parameters:: maxpoints (int)
Return type:: None

comparison_scatter(*, model_exact=None, field='force', data=None, ax=None, maxpoints=2000, **plot_kw)¶

Scatter inferred-vs-exact force (or diffusion) along the trajectory.

Evaluates both fields on the data with force_comparison_arrays() / diffusion_comparison_arrays() and renders them with SFI.utils.plotting.comparison_scatter() (identity line + Pearson r + MSE). Replaces hand-rolled exact-vs-inferred scatters in demos.

Parameters:

model_exact – Object exposing force_sf / diffusion_sf / D.
field ({"force", "diffusion"}) – Which field to compare.
data – Collection to evaluate on (default: training data).
ax – Target axes (default: current axes).
maxpoints (int) – Approximate number of points to evaluate.
**plot_kw – Forwarded to SFI.utils.plotting.comparison_scatter().

Return type:

matplotlib.axes.Axes

compute_diffusion_constant(method='auto')[source]¶

Underdamped constant diffusion inference (ULI estimators).

Estimate measurement noise Λ with method “Lambda”.

If method == ‘auto’: choose ‘noisy’ if tr(Λ)>0 else ‘WeakNoise’.

Average chosen instantaneous estimator to obtain D̄.

Set A = 2 D̄ and derive A_inv, sqrtA, sqrtA_inv.

Parameters:: method (str)
Return type:: None

compute_diffusion_error()¶

Estimate sampling error for diffusion inference.

Mirrors compute_force_error() for the diffusion field. Uses the diffusion Gram matrix (normal matrix) and its inverse.

For linear diffusion inference the moments covariance is proportional to the Gram matrix, giving Cov(θ_D) = cov_factor * G_D⁻¹.

Note

This error estimate is approximate. The diffusion inference is more complex than the force inference: diffusion coefficients are inferred from force residuals, a positive-definiteness constraint applies, and the simple covariance formula Cov(θ_D) = cov_factor * G_D⁻¹ may not capture all sources of uncertainty. Treat the result as a rough guide rather than a rigorous confidence interval.

Notes

self.diffusion_coefficients_covariancejnp.ndarray: Covariance matrix of the diffusion coefficients.
self.diffusion_coefficients_stderrjnp.ndarray: Standard error for each diffusion coefficient.
self.diffusion_informationfloat: Estimated information content of the inferred diffusion field.
self.diffusion_predicted_MSEfloat: Predicted normalized mean squared error.

compute_force_error()¶

Estimate sampling error for force inference.

[Error analysis] Force coefficient covariance & predicted error

\[\operatorname{Cov}(C) = G^{-1}, \qquad \mathbb{E}\!\left[\langle \delta F^\top A^{-1} \delta F \rangle\right] = \operatorname{Tr}\!\left(G\,\operatorname{Cov}(C)\right), \qquad I_F = \tfrac{1}{2}\,C^\top M, \qquad \text{NMSE}_{F,\text{pred}} = \frac{\operatorname{Tr}(G\cdot\operatorname{Cov}(C))}{C^\top M} = \frac{\operatorname{Tr}(G\cdot\operatorname{Cov}(C))}{2 I_F}\]

Assumes the sampling error dominates; measurement noise and discretization biases are not addressed.

This method evaluates the covariance of the inferred force coefficients, the standard error, and computes the predicted normalized mean squared error (MSE) of the inferred force field. This analysis assumes that the sampling error dominates, and measurement noise or discretization biases are not explicitly addressed. It is common to OLI and ULI (by construction of the normal matrix G).

Notes

self.force_coefficients_covariance (jnp.ndarray): Covariance matrix of the force coefficients. self.force_coefficients_stderr (jnp.ndarray): Standard error for each force coefficient. self.force_information (float): Estimated information content of the inferred force field. self.force_predicted_MSE (float): Predicted normalized mean squared error of the inferred force field.

diagnose(*, level='standard', **kwargs)¶

Run the consistency-check suite from SFI.diagnostics.

Convenience wrapper for SFI.diagnostics.assess(). See its docstring for the available level presets.

Parameters:: level (str)

diffusion_comparison_arrays(*, model_exact=None, diffusion_exact=None, data=None, maxpoints=2000)¶

Return (D_exact, D_inferred) evaluated along the trajectory.

Like force_comparison_arrays() but for the diffusion field; a constant exact/inferred diffusion is broadcast to (n_points, d, d).

Parameters:: maxpoints (int)

force_comparison_arrays(*, model_exact=None, force_exact=None, data=None, maxpoints=2000)¶

Return (F_exact, F_inferred) evaluated along the trajectory.

Evaluates the exact and inferred force on the (subsampled, masked) trajectory points, supplying finite-difference velocities for underdamped fields. Feeds comparison_scatter(); also handy for custom diagnostics.

Parameters:

model_exact – Object exposing force_sf (e.g. an OverdampedProcess).
force_exact – Explicit callable exact force (overrides model_exact).
data – Collection to evaluate on (default: the training data).
maxpoints (int) – Approximate number of points to evaluate.

Returns:

(F_exact, F_inferred)

Return type:

tuple of ndarray, shape (n_points, d)

get_diffusion_timeop(method)[source]¶

Parameters:: method (str)
Return type:: TimeOperand

holdout_score(data, *, require_error=False)¶

Held-out NMSE of the fitted force on an independent collection.

A side feature for data-abundant scenarios: SFI estimates its own accuracy from the training data (force_predicted_MSE) and validates fits through the diagnostics suite, neither of which costs any data. Reach for an explicit train/test split (TrajectoryCollection.split_time) only when data is plentiful, or to confirm a suspected bias floor: a ratio near 1 means the fit is sampling-limited, a ratio ≫ 1 means a bias floor (often measurement noise — see the noise-and-sampling guide).

The score is the residual-based normalised mean-square error of force_inferred on data, with the diffusion noise floor subtracted (a bias detector, not a precision instrument: its resolution is set by the χ² fluctuations of the residuals).

Parameters:

data (TrajectoryCollection) – Independent test data (e.g. the second half of coll.split_time(0.8)).
require_error (bool) – If True, run compute_force_error() first when the predicted error is missing, so ratio is always defined.

Returns:

{"holdout_NMSE", "predicted_NMSE", "ratio", "excess_z", "n_obs"}. Also stored as self.force_holdout_NMSE.

Return type:

dict

Notes

Bases that read time-dependent extras are not supported on the held-out path (the residual builders pass extras unsliced).

infer_diffusion(basis=None, *, theta_D0=None, integrator='rk4', n_substeps=1, maxiter=100)[source]¶

Infer state- and velocity-dependent diffusion D(x, v).

Requires a prior parametric infer_force() call. Holds the fitted force fixed and minimises the pentadiagonal windowed conditional NLL over the diffusion parameters, with D(x, v; θ_D) evaluated at the shooting velocity.

[Inference] State-dependent diffusion inference (underdamped)

With the force \(\hat F(x,v)\) held fixed, the state-dependent diffusion \(D(x,v;\theta_D)\) is optimised by minimising the windowed conditional NLL on pentadiagonal (bandwidth-2) covariance windows; \(\Lambda\) from the force inference is held fixed.

Parameters:

basis (Basis (rank 2), PSF, or None) – Diffusion model. A rank-2 Basis gives the linear parameterisation D = Σ_j θ_j d_j; a PSF (optionally needs_v=True) is used directly. When None (default), symmetric_matrix_basis(d) — all constant symmetric d×d diffusion matrices.
theta_D0 (dict, array, or None) – Initial diffusion parameters (default zeros).
integrator ({"rk4", "euler"}) – Flow predictor (default "rk4", matching infer_force()).
n_substeps (int) – Integrator micro-steps per Δt (default 1).
maxiter (int) – L-BFGS maximum iterations (default 100).

Return type:

None

Notes

Sets diffusion_inferred, diffusion_coefficients (and diffusion_basis when basis is a Basis), plus metadata.