SFI.trajectory.degrade module¶

SFI.trajectory.degrade¶

Degrade synthetic trajectories to mimic real data: - motion blur (temporal window average) - downsampling - additive measurement noise - ROI filtering (mask points outside a region) - random data loss

Two front doors:¶

Dataset/Collection API (recommended for internal use) - degrade_dataset(ds, …) - degrade_collection(coll, …)
Columns API (back-compat for I/O scripts) - degrade_columns(meta, particle_idx, time_idx, state_vectors, …)

Why two? Column flow is convenient for simple scripts and file round-trips; dataset/collection flow keeps everything rectangular so we can blur/downsample time-dependent extras cleanly without flatten/unflatten gymnastics.

Extras semantics¶

extras_global:
- arrays with leading shape (T, …) are blurred/downsampled along time like X; other entries are passed through unchanged.
extras_local:
- arrays with shape (N, …): per-particle constants → unchanged
- arrays with shape (T, N, …): blurred/downsampled along time like X

Noise/ROI/data-loss are applied on the mask (not by deleting rows), so tensor shapes remain intact. Flattening to columns (if needed) happens last.

Cache-only extras (auto-generated structural tables)¶

Keys starting with _cache/ are considered auto-generated structural extras (e.g. CSR neighbor lists, stencil hyper tables). They are not degraded and are dropped from outputs, because any degradation/context change invalidates such cached structural objects. They can be regenerated on demand by calling the appropriate host-side preparation routine.

SFI.trajectory.degrade.degrade_collection(coll, *, downsample=1, motion_blur=0, data_loss_fraction=0.0, noise=None, ROI=None, seed=None, reweight='pool')[source]¶

Degrade all datasets in a collection and optionally recompute weights.

Parameters:

coll (TrajectoryCollection) – Input collection to degrade.
downsample (int) – Same semantics as in degrade_dataset().
motion_blur (int) – Same semantics as in degrade_dataset().
data_loss_fraction (float) – Same semantics as in degrade_dataset().
noise (None | float | ndarray) – Same semantics as in degrade_dataset().
ROI (None | float | ndarray | Callable[[ndarray], bool]) – Same semantics as in degrade_dataset().
seed (int | None) – Same semantics as in degrade_dataset().
reweight (Literal['pool', 'keep']) –
Policy for updating collection-level weights after degradation:
- "pool": recompute weights via with_weights("pool").
- "keep": preserve the relative weights from coll.weights.

Returns:

New collection whose datasets have been degraded in the same way.

Return type:

TrajectoryCollection

Notes

This function is purely functional: the input collection is not modified.

SFI.trajectory.degrade.degrade_dataset(ds, *, downsample=1, motion_blur=0, data_loss_fraction=0.0, noise=None, ROI=None, seed=None)[source]¶

Degrade a single TrajectoryDataset.

The function operates in tensor space; it returns a new dataset where:

X is motion-blurred over motion_blur + 1 frames and downsampled by downsample,
the mask is AND-reduced over the blur window, then modified by ROI and random data loss,
t (if present) is averaged over the blur window and downsampled, otherwise scalar dt is multiplied by downsample,
extras are processed consistently (see module docstring).

Parameters:

ds (TrajectoryDataset) – Input dataset to degrade.
downsample (int) – Integer downsampling factor along the time axis (must be >= 1).
motion_blur (int) – Temporal averaging window size minus one. The actual blur window is motion_blur + 1 frames and must satisfy 0 <= motion_blur < downsample.
data_loss_fraction (float) – Fraction of currently valid entries to drop uniformly at random after ROI filtering (in [0, 1)).
noise (None | float | ndarray) – Additive Gaussian noise scale. If a float, isotropic noise with standard deviation noise is applied. If an array, broadcast to the state dimension.
ROI (None | float | ndarray | Callable[[ndarray], bool]) –
Region-of-interest predicate or mask. Can be:
- float: radial cutoff — keeps positions with ‖x‖₂ ≤ ROI,
- (2, d) ndarray: axis-aligned box (row 0 = lower bound, row 1 = upper bound),
- Callable[[np.ndarray], bool]: predicate evaluated on each observed position.
seed (int | None) – Optional RNG seed for the noise and data-loss generators.

Returns:

Degraded dataset with the same number of particles but fewer time steps.

Return type:

TrajectoryDataset

SFI.trajectory.degrade.degrade_spatial_data(coll, *, downscale=2, method='mean', blur_radius=0, data_loss_fraction=0.0, noise=None, seed=None, mask_threshold=0.5, bc='noflux', prefix='box', order='C')[source]¶

Degrade an SPDE-style collection in space (blur/coarsen/pixel-loss/noise).

Assumes the standard SPDE convention:

particle axis N is a flattened grid of shape grid_shape,
state dim d is #fields per site.

dx is read from extras_global['{prefix}/dx'] and updated automatically; it does not need to be supplied here.

Also updates ‘box/’ box parameters and erases structural outputs starting with _cache (regenerated on next use).

Parameters:

coll (TrajectoryCollection)
downscale (int | Tuple[int, ...])
method (Literal['mean', 'subsample'])
blur_radius (int)
data_loss_fraction (float)
noise (None | float | ndarray)
seed (int | None)
mask_threshold (float)
bc (Literal['noflux', 'pbc'])
prefix (str)
order (Literal['C', 'F'])

Return type:

TrajectoryCollection

SFI.trajectory.degrade.degrade_spatial_dataset(ds, *, downscale=1, method='mean', blur_radius=0, data_loss_fraction=0.0, noise=None, rng, mask_threshold=0.5, bc='noflux', prefix='box', order='C')[source]¶

Spatial degradation of a single SPDE-style dataset.

Key invariants ensured by this routine¶

The flattening convention is preserved (order="C" or "F").
Box metadata (grid_shape, dx) is updated consistently after coarsening.
Any prepared structural stencil payload is dropped so it is rebuilt for the new grid.
Mask handling is conservative: a coarse cell is valid only if enough fine pixels are valid.

Parameters:

ds (TrajectoryDataset)
downscale (int | Tuple[int, ...])
method (Literal['mean', 'subsample'])
blur_radius (int)
data_loss_fraction (float)
noise (None | float | ndarray)
rng (Generator)
mask_threshold (float)
bc (Literal['noflux', 'pbc'])
prefix (str)
order (Literal['C', 'F'])

Return type:

TrajectoryDataset