SFI.trajectory.degrade module

SFI.trajectory.degrade

Degrade synthetic trajectories to mimic real data: - motion blur (temporal window average) - downsampling - additive measurement noise - ROI filtering (mask points outside a region) - random data loss

Two front doors:

  1. Dataset/Collection API (recommended for internal use) - degrade_dataset(ds, …) - degrade_collection(coll, …)

  2. Columns API (back-compat for I/O scripts) - degrade_columns(meta, particle_idx, time_idx, state_vectors, …)

Why two? Column flow is convenient for simple scripts and file round-trips; dataset/collection flow keeps everything rectangular so we can blur/downsample time-dependent extras cleanly without flatten/unflatten gymnastics.

Extras semantics

  • extras_global:
    • arrays with leading shape (T, …) are blurred/downsampled along time like X; other entries are passed through unchanged.

  • extras_local:
    • arrays with shape (N, …): per-particle constants → unchanged

    • arrays with shape (T, N, …): blurred/downsampled along time like X

Noise/ROI/data-loss are applied on the mask (not by deleting rows), so tensor shapes remain intact. Flattening to columns (if needed) happens last.

Cache-only extras (auto-generated structural tables)

Keys starting with _cache/ are considered auto-generated structural extras (e.g. CSR neighbor lists, stencil hyper tables). They are not degraded and are dropped from outputs, because any degradation/context change invalidates such cached structural objects. They can be regenerated on demand by calling the appropriate host-side preparation routine.

SFI.trajectory.degrade.degrade_collection(coll, *, downsample=1, motion_blur=0, data_loss_fraction=0.0, noise=None, ROI=None, seed=None, reweight='pool')[source]

Degrade all datasets in a collection and optionally recompute weights.

Parameters:
  • coll (TrajectoryCollection) – Input collection to degrade.

  • downsample (int) – Same semantics as in degrade_dataset().

  • motion_blur (int) – Same semantics as in degrade_dataset().

  • data_loss_fraction (float) – Same semantics as in degrade_dataset().

  • noise (None | float | ndarray) – Same semantics as in degrade_dataset().

  • ROI (None | float | ndarray | Callable[[ndarray], bool]) – Same semantics as in degrade_dataset().

  • seed (int | None) – Same semantics as in degrade_dataset().

  • reweight (Literal['pool', 'keep']) –

    Policy for updating collection-level weights after degradation:

    • "pool": recompute weights via with_weights("pool").

    • "keep": preserve the relative weights from coll.weights.

Returns:

New collection whose datasets have been degraded in the same way.

Return type:

TrajectoryCollection

Notes

This function is purely functional: the input collection is not modified.

SFI.trajectory.degrade.degrade_dataset(ds, *, downsample=1, motion_blur=0, data_loss_fraction=0.0, noise=None, ROI=None, seed=None)[source]

Degrade a single TrajectoryDataset.

The function operates in tensor space; it returns a new dataset where:

  • X is motion-blurred over motion_blur + 1 frames and downsampled by downsample,

  • the mask is AND-reduced over the blur window, then modified by ROI and random data loss,

  • t (if present) is averaged over the blur window and downsampled, otherwise scalar dt is multiplied by downsample,

  • extras are processed consistently (see module docstring).

Parameters:
  • ds (TrajectoryDataset) – Input dataset to degrade.

  • downsample (int) – Integer downsampling factor along the time axis (must be >= 1).

  • motion_blur (int) – Temporal averaging window size minus one. The actual blur window is motion_blur + 1 frames and must satisfy 0 <= motion_blur < downsample.

  • data_loss_fraction (float) – Fraction of currently valid entries to drop uniformly at random after ROI filtering (in [0, 1)).

  • noise (None | float | ndarray) – Additive Gaussian noise scale. If a float, isotropic noise with standard deviation noise is applied. If an array, broadcast to the state dimension.

  • ROI (None | float | ndarray | Callable[[ndarray], bool]) –

    Region-of-interest predicate or mask. Can be:

    • float: radial cutoff — keeps positions with ‖x‖₂ ROI,

    • (2, d) ndarray: axis-aligned box (row 0 = lower bound, row 1 = upper bound),

    • Callable[[np.ndarray], bool]: predicate evaluated on each observed position.

  • seed (int | None) – Optional RNG seed for the noise and data-loss generators.

Returns:

Degraded dataset with the same number of particles but fewer time steps.

Return type:

TrajectoryDataset

SFI.trajectory.degrade.degrade_spatial_data(coll, *, downscale=2, method='mean', blur_radius=0, data_loss_fraction=0.0, noise=None, seed=None, mask_threshold=0.5, bc='noflux', prefix='box', order='C')[source]

Degrade an SPDE-style collection in space (blur/coarsen/pixel-loss/noise).

Assumes the standard SPDE convention:
  • particle axis N is a flattened grid of shape grid_shape,

  • state dim d is #fields per site.

dx is read from extras_global['{prefix}/dx'] and updated automatically; it does not need to be supplied here.

Also updates ‘box/’ box parameters and erases structural outputs starting with _cache (regenerated on next use).

Parameters:
  • coll (TrajectoryCollection)

  • downscale (int | Tuple[int, ...])

  • method (Literal['mean', 'subsample'])

  • blur_radius (int)

  • data_loss_fraction (float)

  • noise (None | float | ndarray)

  • seed (int | None)

  • mask_threshold (float)

  • bc (Literal['noflux', 'pbc'])

  • prefix (str)

  • order (Literal['C', 'F'])

Return type:

TrajectoryCollection

SFI.trajectory.degrade.degrade_spatial_dataset(ds, *, downscale=1, method='mean', blur_radius=0, data_loss_fraction=0.0, noise=None, rng, mask_threshold=0.5, bc='noflux', prefix='box', order='C')[source]

Spatial degradation of a single SPDE-style dataset.

Key invariants ensured by this routine

  1. The flattening convention is preserved (order="C" or "F").

  2. Box metadata (grid_shape, dx) is updated consistently after coarsening.

  3. Any prepared structural stencil payload is dropped so it is rebuilt for the new grid.

  4. Mask handling is conservative: a coarse cell is valid only if enough fine pixels are valid.

Parameters:
  • ds (TrajectoryDataset)

  • downscale (int | Tuple[int, ...])

  • method (Literal['mean', 'subsample'])

  • blur_radius (int)

  • data_loss_fraction (float)

  • noise (None | float | ndarray)

  • rng (Generator)

  • mask_threshold (float)

  • bc (Literal['noflux', 'pbc'])

  • prefix (str)

  • order (Literal['C', 'F'])

Return type:

TrajectoryDataset