Diagnostics¶
Once an inference object is fitted, the next question is should we
trust it? The SFI.diagnostics submodule answers that by
recomputing standardised residuals from the fitted force and the
inferred constant diffusion, then testing whether they look like an
independent \(\mathcal N(0, 1)\) sample — the signature of a
correctly specified model.
Quick start¶
import SFI
from SFI.diagnostics import assess
inf = SFI.OverdampedLangevinInference(collection)
inf.compute_diffusion_constant()
inf.infer_force_linear(basis)
inf.compute_force_error()
report = assess(inf, level="standard") # or inf.diagnose()
report.print_summary()
The same call works for UnderdampedLangevinInference,
multi-particle and multi-dataset collections, and for the linear,
parametric, and nonlinear inference paths. Data access is routed
through TrajectoryDataset.make_batch_producer (the same streaming
layer used internally by SFI.integrate), so masked frames and a
variable per-step dt are handled uniformly.
What is computed¶
For an overdamped fit, the residual is the Euler–Maruyama innovation
whitened by \((2\bar D\,\Delta t)^{-1/2}\) so that, under a correct model, the standardised residual \(z_t\) is \(\mathcal N(0, I)\).
For an underdamped fit (positions only), the residual is the symmetric finite-difference acceleration minus the fitted force — the same quantity the underdamped estimator itself fits:
with \(\hat x_t = \tfrac13(x_{t-1} + x_t + x_{t+1})\) and \(\hat v_t = (x_{t+1} - x_{t-1}) / (2\Delta t)\). The noise of this second difference is the second difference of integrated white noise, with variance \(\tfrac23\,(2\bar D)/\Delta t\); this sets the whitening. Because adjacent accelerations share two of three positions, the residual is a moving-average process in time, so the builder keeps every second valid index — leaving the pooled innovations serially independent.
The standard suite (level="standard", the default) runs:
Residual moments — pooled mean, standard deviation, skewness, and excess kurtosis, with a per-component breakdown.
Autocorrelation — Ljung–Box on \(z_t\) and on \(z_t^2\) (the latter detects volatility clustering / a mis-estimated diffusion).
Normality — Kolmogorov–Smirnov against \(\mathcal N(0, 1)\), plus the raw Q–Q data.
MSE consistency — compares the realised mean-square residual to the predicted (sampling-noise) NMSE via a sampling-noise-aware chi-square z-score; \(|z| > 5\) flags bias or a mis-specified diffusion.
The "minimal" level computes the residual moments only.
Output¶
assess() returns a
DiagnosticsReport carrying a residuals
section and a meta dict (backend, regime, sample sizes). The
report exposes:
print_summary()— a concise human-readable table with✓/✗marks at a chosen significance level, ending in a-- Flags --block;flag_issues()— the list of warnings (one string per failing check) thatprint_summaryuses;
Plotting helpers¶
from SFI.diagnostics import (
plot_qq, plot_residual_histogram, plot_residual_acf, plot_summary,
)
fig = plot_summary(report) # 1×3 figure, all panels at once
plot_qq()— normal Q–Q with the \(y = x\) reference;plot_residual_histogram()— histogram with the \(\mathcal N(0, 1)\) density overlaid;plot_residual_acf()— autocorrelation of \(z\) and \(z^2\) with the 95 % Bartlett band;plot_summary()— the three panels side by side.
All panels accept either a fitted inferer or a precomputed
DiagnosticsReport.
Interpreting flags¶
[autocorr/ljung_box]— the model is missing a time-correlated feature (for example a constant-only fit on a mean-reverting process).[autocorr/ljung_box_squared]— the diffusion is mis-estimated or state-dependent.[normality/ks]— the residuals are non-Gaussian: rare events not captured by the basis, or a non-Gaussian noise structure.[moments/std]— a whitened standard deviation \(\not\approx 1\) usually means \(\bar D\) is wrong (try a differentcompute_diffusion_constant()method).[moments/mean]— a non-zero residual mean points to a systematic drift bias.[mse_consistency]— realised NMSE far above predicted: the model is biased; widen the basis or runsparsify_force()and refit.
Each printed flag carries the corresponding action hint inline
("[mse_consistency] … — … consider the parametric estimator
(infer_force)"); pass hints=False to
flag_issues() or
print_summary() for bare
statistics (machine parsing).
How visible a flag is depends on the sampling interval. At very fine \(\Delta t\) the diffusion estimate can absorb a weak force-misspecification, leaving the marginal tests looking clean; coarser sampling makes the leftover structure show up in the autocorrelation and MSE-consistency checks.
When a flag points beyond the linear estimators¶
On experimental data, [mse_consistency], [moments/std], and
[autocorr/ljung_box_squared] very often trace back to measurement
noise or coarse sampling rather than a wrong basis. The cure is then
not a bigger basis but the noise-aware parametric estimators
(infer_force() / infer_diffusion()), which model the
measurement-noise covariance \(\Lambda\) explicitly and
replace the Euler secant with an RK4 flow step:
[mse_consistency]with localization noise suspected — refit with the parametric estimator on a fresh inference object and compare; Measurement noise and coarse sampling walks through it.[moments/std]far from 1 after trying bothcompute_diffusion_constant()methods — the noise level is biasing the diffusion estimate; the parametric estimators profile \((\mathbf{D}, \Lambda)\) jointly.[autocorr/ljung_box]that persists after widening the basis — suspect coarse sampling; the parametric RK4 flow step extends the usable \(\Delta t\) range.
If the parametric and linear fits agree, the flags point back at the
model itself: widen or restructure the basis
(Building bases) and re-run sparsify_force().
When data is plentiful, a suspected bias floor can also be confirmed
with an explicit train/test split — coll.split_time(0.8) +
inf.holdout_score(test), and assess(inf, data=test) for
held-out diagnostics. This is a side feature: the predicted error and
this suite cost no data, which is SFI’s preferred, data-efficient
route.
Classifying overdamped vs. underdamped dynamics¶
Note
This classifier is experimental — the discriminator and verdict
thresholds may still change, and the "inconclusive" band is
deliberately conservative.
Before choosing an inference engine you can ask the data which regime it is
in. classify_dynamics() (experimental) reads raw
positions and returns an "OD" / "UD" / "inconclusive" verdict,
robust to high localization noise and coarse sampling:
from SFI import classify_dynamics
from SFI.diagnostics import plot_dynamics_order
report = classify_dynamics(collection)
report.print_summary() # verdict + tau_v, sigma, D, AICc, scaling slope
plot_dynamics_order(report) # rho2(dt) and apparent-KE scaling panels
The discriminator is the lag-resolved displacement covariance
\(C_k=\langle\Delta x_t\cdot\Delta x_{t+k}\rangle\). White localization
noise touches only \(C_0\) and \(C_1\), so lag-2 statistics are
measurement-noise-immune; scanning \(\Delta t\) separates the overdamped
force confound (which vanishes as \(\Delta t\to0\)) from genuine momentum
persistence (which saturates). The verdict combines a model-free scaling
test, a parametric fit of the diffusion + inertia + localization covariance
model (recovering the momentum relaxation time \(\tau_v\)), and an
overdamped-fit residual-autocorrelation cross-check. At coarse sampling
(\(\gamma\,\Delta t\gtrsim1\)) momentum is unresolved and the verdict is
"inconclusive" by design. The method assumes white localization noise;
strong memory (a generalized-Langevin / viscoelastic bath) can also produce
velocity persistence and is out of scope. A worked example ships in the
gallery (Overdamped or underdamped? Classifying dynamics from data).
References¶
Ljung & Box (1978); Diebold, Gunther & Tay (1998).