.. _diagnostics_user_guide:

Diagnostics
===========

Once an inference object is fitted, the next question is *should we
trust it?* The :mod:`SFI.diagnostics` submodule answers that by
recomputing standardised residuals from the fitted force and the
inferred constant diffusion, then testing whether they look like an
independent :math:`\mathcal N(0, 1)` sample — the signature of a
correctly specified model.

Quick start
-----------

.. code-block:: python

   import SFI
   from SFI.diagnostics import assess

   inf = SFI.OverdampedLangevinInference(collection)
   inf.compute_diffusion_constant()
   inf.infer_force_linear(basis)
   inf.compute_force_error()

   report = assess(inf, level="standard")   # or inf.diagnose()
   report.print_summary()

The same call works for :class:`~SFI.UnderdampedLangevinInference`,
multi-particle and multi-dataset collections, and for the linear,
parametric, and nonlinear inference paths.  Data access is routed
through ``TrajectoryDataset.make_batch_producer`` (the same streaming
layer used internally by :mod:`SFI.integrate`), so masked frames and a
variable per-step ``dt`` are handled uniformly.

What is computed
----------------

For an **overdamped** fit, the residual is the Euler–Maruyama
innovation

.. math::

   r_t = \Delta x_t - F(x_t)\,\Delta t,
   \qquad \mathrm{Cov}(r_t) = 2\,\bar D\,\Delta t,

whitened by :math:`(2\bar D\,\Delta t)^{-1/2}` so that, under a correct
model, the standardised residual :math:`z_t` is :math:`\mathcal N(0, I)`.

For an **underdamped** fit (positions only), the residual is the
symmetric finite-difference acceleration minus the fitted force — the
same quantity the underdamped estimator itself fits:

.. math::

   r_t = \hat a_t - F(\hat x_t, \hat v_t),
   \qquad
   \hat a_t = \frac{x_{t+1} - 2x_t + x_{t-1}}{\Delta t^2},

with :math:`\hat x_t = \tfrac13(x_{t-1} + x_t + x_{t+1})` and
:math:`\hat v_t = (x_{t+1} - x_{t-1}) / (2\Delta t)`.  The noise of this
second difference is the second difference of integrated white noise,
with variance :math:`\tfrac23\,(2\bar D)/\Delta t`; this sets the
whitening.  Because adjacent accelerations share two of three
positions, the residual is a moving-average process in time, so the
builder keeps every second valid index — leaving the pooled
innovations serially independent.

The standard suite (``level="standard"``, the default) runs:

* **Residual moments** — pooled mean, standard deviation, skewness, and
  excess kurtosis, with a per-component breakdown.
* **Autocorrelation** — Ljung–Box on :math:`z_t` and on :math:`z_t^2`
  (the latter detects volatility clustering / a mis-estimated
  diffusion).
* **Normality** — Kolmogorov–Smirnov against :math:`\mathcal N(0, 1)`,
  plus the raw Q–Q data.
* **MSE consistency** — compares the realised mean-square residual to
  the predicted (sampling-noise) NMSE via a sampling-noise-aware
  chi-square z-score; :math:`|z| > 5` flags bias or a mis-specified
  diffusion.

The ``"minimal"`` level computes the residual moments only.

Output
------

:func:`~SFI.diagnostics.assess` returns a
:class:`~SFI.diagnostics.DiagnosticsReport` carrying a ``residuals``
section and a ``meta`` dict (backend, regime, sample sizes).  The
report exposes:

* :meth:`~SFI.diagnostics.DiagnosticsReport.print_summary` — a concise
  human-readable table with ``✓/✗`` marks at a chosen significance
  level, ending in a ``-- Flags --`` block;
* :meth:`~SFI.diagnostics.DiagnosticsReport.flag_issues` — the list of
  warnings (one string per failing check) that ``print_summary`` uses;
* :meth:`~SFI.diagnostics.DiagnosticsReport.to_dict` /
  :meth:`~SFI.diagnostics.DiagnosticsReport.to_json` — JSON-clean
  serialisation.

Plotting helpers
----------------

.. code-block:: python

   from SFI.diagnostics import (
       plot_qq, plot_residual_histogram, plot_residual_acf, plot_summary,
   )

   fig = plot_summary(report)        # 1×3 figure, all panels at once

* :func:`~SFI.diagnostics.plot_qq` — normal Q–Q with the :math:`y = x`
  reference;
* :func:`~SFI.diagnostics.plot_residual_histogram` — histogram with the
  :math:`\mathcal N(0, 1)` density overlaid;
* :func:`~SFI.diagnostics.plot_residual_acf` — autocorrelation of
  :math:`z` and :math:`z^2` with the 95 % Bartlett band;
* :func:`~SFI.diagnostics.plot_summary` — the three panels side by side.

All panels accept either a fitted inferer or a precomputed
:class:`~SFI.diagnostics.DiagnosticsReport`.

Interpreting flags
------------------

* ``[autocorr/ljung_box]`` — the model is missing a time-correlated
  feature (for example a constant-only fit on a mean-reverting process).
* ``[autocorr/ljung_box_squared]`` — the diffusion is mis-estimated or
  state-dependent.
* ``[normality/ks]`` — the residuals are non-Gaussian: rare events not
  captured by the basis, or a non-Gaussian noise structure.
* ``[moments/std]`` — a whitened standard deviation
  :math:`\not\approx 1` usually means :math:`\bar D` is wrong (try a
  different :meth:`~SFI.inference.OverdampedLangevinInference.compute_diffusion_constant` ``method``).
* ``[moments/mean]`` — a non-zero residual mean points to a systematic
  drift bias.
* ``[mse_consistency]`` — realised NMSE far above predicted: the model
  is biased; widen the basis or run :meth:`sparsify_force` and refit.

Each printed flag carries the corresponding action hint inline
(``"[mse_consistency] … — … consider the parametric estimator
(infer_force)"``); pass ``hints=False`` to
:meth:`~SFI.diagnostics.DiagnosticsReport.flag_issues` or
:meth:`~SFI.diagnostics.DiagnosticsReport.print_summary` for bare
statistics (machine parsing).

How visible a flag is depends on the sampling interval.  At very fine
:math:`\Delta t` the diffusion estimate can absorb a weak
force-misspecification, leaving the marginal tests looking clean;
coarser sampling makes the leftover structure show up in the
autocorrelation and MSE-consistency checks.

When a flag points beyond the linear estimators
-----------------------------------------------

On experimental data, ``[mse_consistency]``, ``[moments/std]``, and
``[autocorr/ljung_box_squared]`` very often trace back to measurement
noise or coarse sampling rather than a wrong basis.  The cure is then
not a bigger basis but the noise-aware **parametric estimators**
(:meth:`~SFI.inference.OverdampedLangevinInference.infer_force` / :meth:`~SFI.inference.OverdampedLangevinInference.infer_diffusion`), which model the
measurement-noise covariance :math:`\Lambda` explicitly and
replace the Euler secant with an RK4 flow step:

* ``[mse_consistency]`` with localization noise suspected — refit with
  the parametric estimator on a fresh inference object and compare;
  :doc:`/inference/noise_and_sampling` walks through it.
* ``[moments/std]`` far from 1 after trying both
  :meth:`~SFI.inference.OverdampedLangevinInference.compute_diffusion_constant` methods — the noise level is biasing
  the diffusion estimate; the parametric estimators profile
  :math:`(\mathbf{D}, \Lambda)` jointly.
* ``[autocorr/ljung_box]`` that persists after widening the basis —
  suspect coarse sampling; the parametric RK4 flow step extends the
  usable :math:`\Delta t` range.

If the parametric and linear fits *agree*, the flags point back at the
model itself: widen or restructure the basis
(:doc:`/bases/user_guide`) and re-run :meth:`sparsify_force`.

When data is plentiful, a suspected bias floor can also be confirmed
with an explicit train/test split — ``coll.split_time(0.8)`` +
``inf.holdout_score(test)``, and ``assess(inf, data=test)`` for
held-out diagnostics.  This is a side feature: the predicted error and
this suite cost no data, which is SFI's preferred, data-efficient
route.

.. _dynamics-order-classifier:

Classifying overdamped vs. underdamped dynamics
-----------------------------------------------

.. note::

   This classifier is **experimental** — the discriminator and verdict
   thresholds may still change, and the ``"inconclusive"`` band is
   deliberately conservative.

Before choosing an inference engine you can ask the data which regime it is
in.  :func:`~SFI.diagnostics.classify_dynamics` (experimental) reads raw
positions and returns an ``"OD"`` / ``"UD"`` / ``"inconclusive"`` verdict,
robust to high localization noise and coarse sampling:

.. code-block:: python

    from SFI import classify_dynamics
    from SFI.diagnostics import plot_dynamics_order

    report = classify_dynamics(collection)
    report.print_summary()        # verdict + tau_v, sigma, D, AICc, scaling slope
    plot_dynamics_order(report)   # rho2(dt) and apparent-KE scaling panels

The discriminator is the lag-resolved displacement covariance
:math:`C_k=\langle\Delta x_t\cdot\Delta x_{t+k}\rangle`.  White localization
noise touches only :math:`C_0` and :math:`C_1`, so lag-2 statistics are
measurement-noise-immune; scanning :math:`\Delta t` separates the overdamped
force confound (which vanishes as :math:`\Delta t\to0`) from genuine momentum
persistence (which saturates).  The verdict combines a model-free scaling
test, a parametric fit of the diffusion + inertia + localization covariance
model (recovering the momentum relaxation time :math:`\tau_v`), and an
overdamped-fit residual-autocorrelation cross-check.  At coarse sampling
(:math:`\gamma\,\Delta t\gtrsim1`) momentum is unresolved and the verdict is
``"inconclusive"`` by design.  The method assumes *white* localization noise;
strong memory (a generalized-Langevin / viscoelastic bath) can also produce
velocity persistence and is out of scope.  A worked example ships in the
gallery (*Overdamped or underdamped? Classifying dynamics from data*).

References
----------

Ljung & Box (1978); Diebold, Gunther & Tay (1998).