SFI.diagnostics.residuals module

Per-backend residual builders.

Each builder takes a fitted inference object and returns a ResidualBundle containing pooled standardized residuals \(z = \Sigma^{-1/2} r\) ready to feed into the statistical tests.

Measurement-noise-aware, banded whitening

Both residuals carry two correlation sources that a single-residual whitening ignores:

  • Measurement noise \(\Sigma_\eta\). The diagnostic residual covariance is \(C = \text{(thermal)} + c\,\Sigma_\eta\), not the thermal part alone. The estimator’s profiled \(\Sigma_\eta\) (inferer.Lambda) is folded into C so that a well-recovered but noisy fit still whitens to unit variance instead of tripping every flag. On clean data \(\Sigma_\eta\approx 0\) and this reduces to the thermal whitening.

  • Serial correlation. Localisation error is shared between neighbouring residuals, so the residual series is a moving-average process (overdamped increment → MA(1) with lag-1 block \(-\Sigma_\eta\); the kept underdamped acceleration series → MA(1) with lag-1 block \(\Sigma_\eta/\Delta t^4\)). A banded whitening — the sequential block-Cholesky innovations of the tridiagonal residual covariance (_sequential_innovations()) — decorrelates the stream, exactly paralleling the parametric core’s banded precision. On clean data the off-diagonal block vanishes and the innovations coincide with the marginal whitening.

The whitened stream z (moments / normality / autocorrelation) uses the banded innovations; the per-row Mahalanobis norms z_squared_norms (the chi-square / MSE-consistency bias check) keep the marginal noise-aware form, which faithfully preserves a slowly-varying force bias that the innovations would partly difference out.

Residual conventions

Overdamped:

\[r_{t,n} = X_{t+1,n} - X_{t,n} - F(X_{t,n})\,\Delta t, \qquad C_{t} = 2\,\bar D\,\Delta t + 2\,\Sigma_\eta,\]

with lag-1 covariance \(-\Sigma_\eta\). For the linear path the thermal part is the exact ML residual; for the parametric path it is an approximation that is nevertheless consistent (whitened residuals should have unit variance and no autocorrelation if the model is well specified).

Underdamped: symmetric acceleration \(\hat a_t = (X_{t+1} - 2X_t + X_{t-1})/\Delta t^2\),

\[r_t = \hat a_t - F(\hat x_t, \hat v_t), \qquad C_t = \tfrac23\,\frac{2\bar D}{\Delta t} + \frac{6\,\Sigma_\eta}{\Delta t^4}.\]

For both regimes residuals are pooled across time, particles, and spatial components, applying the dataset’s dynamic_mask (for overdamped) or its 1-step erosion (for underdamped, which needs three consecutive valid observations).

class SFI.diagnostics.residuals.ResidualBundle(z, z_components, z_squared_norms, force_quadratic_form, mean_dt, n_obs, d, regime, backend, n_particles, nmse_excess_factor=1.0, whitened=<factory>)[source]

Bases: object

Standardised residuals + metadata.

Variables:
  • z (np.ndarray) – Whitened residuals, shape (K,). Pooled across time, particles and spatial components after masking.

  • z_components (np.ndarray) – Whitened residuals organised by spatial component, shape (K_per_component, d). Used for per-axis statistics.

  • z_squared_norms (np.ndarray) – Per-row squared Mahalanobis norm \(r_t^\top \Sigma_t^{-1} r_t\), shape (K_per_row,). Used for the diffusion / “chi-square” check.

  • force_quadratic_form (np.ndarray) – Per-row quadratic form \(F^\top A^{-1} F\) evaluated on the same valid samples used to build z. Pre-computing it here avoids a second evaluation of F in the MSE-consistency check downstream.

  • mean_dt (float) – Average step size used in the residual construction.

  • n_obs (int) – Number of valid (un-masked) observations used to build z.

  • d (int) – Spatial dimension.

  • regime (str) – "OD" or "UD".

  • backend (str) – Coarse tag of the inference path ("linear", "parametric", "nonlinear"). For diagnostic display only.

  • n_particles (int) – Maximum number of particles in any dataset.

  • nmse_excess_factor (float) – Conversion factor from the chi-square excess to the force NMSE in mse_consistency(). 1.0 for the overdamped increment residual; KAPPA_UD for the underdamped acceleration residual (see that constant for the derivation).

  • whitened (list of (np.ndarray, np.ndarray)) – Per-dataset (z_full, mask) pairs with z_full of shape (K, N, d) (time-major) and mask of shape (K, N). Kept so that autocorrelation can be measured strictly along time, per particle and per component — pooling the flattened z stream would mix particles and components at short lags.

Parameters:
  • z (ndarray)

  • z_components (ndarray)

  • z_squared_norms (ndarray)

  • force_quadratic_form (ndarray)

  • mean_dt (float)

  • n_obs (int)

  • d (int)

  • regime (str)

  • backend (str)

  • n_particles (int)

  • nmse_excess_factor (float)

  • whitened (list)

backend: str
d: int
force_quadratic_form: ndarray
mean_dt: float
n_obs: int
n_particles: int
nmse_excess_factor: float = 1.0
regime: str
whitened: list
z: ndarray
z_components: ndarray
z_squared_norms: ndarray
SFI.diagnostics.residuals.build_overdamped_residuals(inferer, data=None)[source]

Build standardised Euler–Maruyama residuals for an OD inferer.

Routes data access through TrajectoryDataset.make_batch_producer — the same low-level streaming layer used by SFI.integrate — so multi-particle, masked, and multi-dataset trajectories are handled transparently.

Works for any overdamped inference path (linear, parametric, nonlinear) as long as inferer.force_inferred is callable and inferer.A_inv is available.

Return type:

ResidualBundle

SFI.diagnostics.residuals.build_residuals(inferer, data=None)[source]

Dispatch to the OD / UD residual builder based on the engine class.

data (optional) evaluates the residuals on an independent TrajectoryCollection instead of the training data — the held-out path used by holdout_score.

Return type:

ResidualBundle

SFI.diagnostics.residuals.build_underdamped_residuals(inferer, data=None)[source]

Build standardised innovations for a UD inferer from the symmetric acceleration residual.

Uses the symmetric ULI kinematics that the underdamped force estimator itself fits (see SFI.inference.underdamped):

\[\hat x = \tfrac13(X_{t-1}+X_t+X_{t+1}), \quad \hat v = \frac{X_{t+1}-X_{t-1}}{2\Delta t}, \quad \hat a = \frac{X_{t+1}-2X_t+X_{t-1}}{\Delta t^2},\]

and forms the residual \(r_t = \hat a - F(\hat x, \hat v)\). Its thermal noise covariance is \(\tfrac23 A/\Delta t\) (see KAPPA_UD); with measurement noise the diagonal block gains \(6\Sigma_\eta/\Delta t^4\). The thermal residual is MA(1), so only every second valid index is kept (removing the thermal lag-1); the residual measurement-noise correlation (lag-1 block \(\Sigma_\eta/\Delta t^4\)) is removed by the banded innovations whitening, leaving a serially independent stream.

Like build_overdamped_residuals(), all data access uses TrajectoryDataset.make_batch_producer so masking and multi-dataset / multi-particle pooling are handled by the same streaming layer that powers SFI.integrate.

Return type:

ResidualBundle