.. _noise-and-sampling: Measurement noise and coarse sampling ===================================== Real trajectories are rarely clean: localization error blurs every position, and the camera frame rate fixes a sampling interval :math:`\Delta t` that may be coarse compared to the dynamics. Both imperfections bias the linear estimators in ways that **more data will not fix** — they call for the parametric estimators instead. This page shows how to recognise the symptoms and what to run. Recognising the symptoms ------------------------ You likely have a measurement-noise or sampling problem when: - **Diagnostics flag it.** After a linear fit, :func:`SFI.diagnostics.assess` reports ``[mse_consistency]`` (realised error well above the predicted, sampling-noise value), residual ``[autocorr]`` flags, or a whitened residual standard deviation far from 1. On experimental data these flags usually mean noise or coarse sampling, not a wrong basis. - **The diffusion estimators disagree.** ``compute_diffusion_constant(method="noisy")`` (noise-aware) and ``method="WeakNoise"`` (clean-data) give clearly different values, or ``inf.Lambda`` — the estimated measurement-noise covariance — is comparable to :math:`2 D \Delta t`. - **The error plateaus.** Adding more data keeps shrinking the *predicted* error (``inf.force_predicted_MSE``) while the realised error against held-out data stalls: you have hit a bias floor. Why the linear estimators acquire a bias ---------------------------------------- Two distinct mechanisms, both growing from the finite-difference construction: **Errors-in-variables (measurement noise).** The linear estimators regress finite-difference velocities on basis functions evaluated at the *measured* positions. Localization noise :math:`\eta` of covariance :math:`\Lambda` enters both sides: it inflates the velocity estimate (variance :math:`\sim 2\Lambda/\Delta t^2`) and perturbs the regressors. On nonlinear systems the resulting errors-in-variables bias is proportional to the noise level and does not average away with longer trajectories. **Euler secant (coarse sampling).** The linear estimators approximate the drift over one interval by the straight-line secant :math:`(\mathbf{x}_{t+\Delta t} - \mathbf{x}_t)/\Delta t`. When :math:`\Delta t` is no longer small compared to the dynamical timescales, the secant mis-tracks the curved true flow and the estimate acquires an :math:`O(\Delta t)` bias. See :ref:`parametric-concept` for the quantitative treatment of both effects. The parametric workflow ----------------------- The parametric estimators address both mechanisms natively: a single RK4 flow step per observation interval replaces the Euler secant, the measurement noise :math:`\Lambda` is part of the observation model, and the *skip-trick* errors-in-variables instrument keeps the estimating equation consistent under noise. .. code-block:: python from SFI import OverdampedLangevinInference from SFI.bases import monomials_up_to inf = OverdampedLangevinInference(coll) B = monomials_up_to(order=3, dim=2, rank='vector') inf.infer_force(B) # profiles (D, Λ) automatically inf.infer_diffusion() # optional: defaults to a symmetric-matrix basis inf.compute_force_error() inf.print_report() Notes: - ``F`` can be a :class:`~SFI.statefunc.Basis` (fast Gauss–Newton path, PASTIS sparsification wired) or any differentiable :class:`~SFI.statefunc.PSF` — see :ref:`choosing-an-estimator`. - The noise and diffusion levels :math:`(\mathbf{D}, \Lambda)` are **profiled automatically**: closed-form moment estimators initialise them, and one conditional-NLL refinement updates them at the fitted parameters. Nothing to tune. - If you know the noise from calibration (e.g. the localization precision of your microscope), pass it explicitly — and pass the diffusion too if known, which skips profiling entirely: .. code-block:: python inf.infer_force(B, D=D_known, Lambda=Sigma_known) # fast path - The errors-in-variables instrument is on by default (``eiv="auto"``); you should not need to touch it. **Runtime expectations.** The parametric fit is iterative: expect minutes where the linear estimators take seconds on large problems, though on moderate data the gap is small (an underdamped solve at :math:`T \approx 10^4`, :math:`n = 8` runs in ~20 s on a laptop CPU core, vs. ~10 s for the linear estimator — see :ref:`parametric-algorithm` for scaling). **Cross-checking.** Running both estimator families on the same basis is itself a diagnostic: if they agree, noise and sampling effects are under control and you can keep the cheaper linear workflow; if they disagree, trust the parametric fit — the discrepancy measures the linear bias. Worked examples and validation ------------------------------ - :doc:`/gallery/experimental_workflow_demo` — an end-to-end experimental pipeline where the diagnostics flag localization noise and the parametric estimator removes the bias. .. seealso:: - :ref:`choosing-an-estimator` — the regime table. - :ref:`parametric-concept` — the observation model and estimator theory. - :ref:`parametric-algorithm` — algorithm details and the full parameter reference. - :doc:`/inference/underdamped` — noise is doubly harmful for inertial systems; the underdamped page covers the specifics.