Trajectory file formats¶
This page specifies the on-disk formats read by
TrajectoryCollection.load
and written by
TrajectoryCollection.save.
Three formats are supported — CSV, Parquet, and HDF5 — all
sharing the same tabular layout: one row per observation.
from SFI.trajectory import TrajectoryCollection
coll = TrajectoryCollection.load("tracks.csv") # or .parquet / .h5
coll.save("tracks.parquet") # format from suffix
Table layout¶
Each row is one observation of one particle at one time step:
Column |
Required |
Content |
|---|---|---|
|
optional |
Integer track identifier. If absent, the file is a
single-trajectory file (pass |
|
yes |
Integer time index (0-based after relabelling). |
|
yes |
State-vector components (positions, angles, concentrations, …). |
CSV files identify columns by position, not by name: by default
column 0 is the particle identifier, column 1 the time index, and every
remaining column without an extras prefix (below) is a state component.
Your columns can therefore be named particle_id, frame, x, y or
anything else. Parquet and HDF5 files identify columns by name and
must use the canonical names particle_id and time_step.
Rows containing NaNs are dropped on load; masked samples are dropped on save (only valid rows are written).
Extras columns¶
Per-observation metadata is carried in extra numeric columns, classified by a name prefix:
Prefix |
Kind |
Example |
|---|---|---|
|
Global scalar (constant for the whole dataset) |
|
|
Time-dependent global (depends on \(t\) only) |
|
|
Per-particle constant (depends on particle only) |
|
|
Time- and particle-dependent |
|
On load these populate extras_global / extras_local of the
dataset and become available to state functions through the extras
mechanism (see Trajectory data).
Metadata header¶
A file can carry a YAML metadata mapping — most importantly the time
step dt:
CSV — leading comment lines: a
# ---opener followed by# key: valuelines.# --- # dt: 0.01 # description: 2D optical tweezer particle_id,frame,x,y 0,0,-0.017995,-0.025163 0,1,0.037124,-0.100932
(Plain
# key: valuelines without the# ---opener are also accepted, as inexamples/experimental_data/optical_tweezer.csv.)Parquet — the same YAML string stored in the table schema metadata under the key
sfi_yaml_header.HDF5 — one dataset per column inside a
tablegroup; the YAML string stored as the root attributesfi_yaml_header.
Recognised keys:
dt— scalar sampling interval (seconds, or your time unit). Accepted both at the top level and insideextras_global(files written byTrajectoryCollection.save()use the latter);extras_global— a mapping of arbitrary scalars or arrays. The special keyt(a length-Tvector) defines a non-uniform time axis and overridesdt;anything else is kept as free-form dataset metadata (
coll.datasets[0].meta).
Named columns¶
When a file does not follow the positional/canonical layout above, select the columns explicitly — by name for any format, or by index for CSV:
coll = TrajectoryCollection.load(
"raw_tracks.csv",
particle_column="particle", # or an int index (CSV only)
time_column="t",
state_columns=("x", "y"), # drops every other non-extras column
)
For in-memory tables, TrajectoryCollection.from_dataframe() is
the more convenient entry point (auto-detection of common column
names) — see Trajectory data.
Loading behaviour¶
TrajectoryCollection.load() accepts a single file or a directory
and takes two knobs:
relabel=True(default) — particle IDs are compressed to0..N-1and time indices shifted to start at 0. The original IDs are recorded inextras_local["original_particle_id"].compress_particles=False— when True, particles whose time supports do not overlap (with a 2-frame safety buffer) are packed into the same column slot. Useful for open-boundary data where particles enter and leave the field of view, which otherwise makes the array width grow with the total number of unique tracks rather than the concurrent count. The mapping is stored indataset.meta["particle_column_map"].
Weights: every load initialises dataset weights with the default
"pool" policy; call coll.with_weights(...) after loading if you
need a different policy.
Multi-dataset directories¶
A collection with several datasets saves to a directory:
my_experiments/
├── ds_000.parquet
├── ds_001.parquet
└── manifest.yaml # records dataset names and filenames
TrajectoryCollection.load("my_experiments/") reconstructs the full
collection, one dataset per file.
Round trip¶
import jax.numpy as jnp
from SFI.trajectory import TrajectoryCollection
coll = TrajectoryCollection.from_arrays(X=jnp.zeros((100, 3, 2)), dt=0.05)
coll.save("run.parquet")
coll2 = TrajectoryCollection.load("run.parquet")
State arrays, masks, time axis, extras, and metadata survive the round trip, up to the loss of masked samples (which are never written).
See also
Trajectory data — constructing collections from arrays, columns, or files; masking; extras; combining experiments.
Trajectory API — full API of the trajectory layer.