Sparse model selection¶
When the basis library is larger than the true model, many inferred coefficients should be zero. SFI’s sparse model selection module identifies which basis functions are genuinely needed and sets the rest to zero, yielding interpretable, parsimonious models with controlled overfitting.
Quick start¶
After running infer_force_linear(),
call sparsify_force():
inf.infer_force_linear(basis)
result = inf.sparsify_force(criterion="PASTIS")
This performs a beam search (the default method) over all possible subsets of the basis, builds a Pareto front of information-gain vs. model size, and selects the best model according to the PASTIS information criterion.
The return value is a SparsityResult that
holds the full Pareto front and can be queried for any criterion:
# Re-select with a different criterion without re-running the search
# (BIC and EBIC require the trajectory time tau)
k, support, score, coeffs = result.select_by_ic("BIC", tau=tau)
# Summary table of all criteria (tau enables BIC/EBIC)
result.all_ic(tau=tau, verbose=True)
Choosing a search method¶
Four algorithms are available via the method keyword. Each
explores the combinatorial space of supports differently:
Method |
Description |
Strengths |
When to use |
|---|---|---|---|
|
Bidirectional beam search (PASTIS original). Expands ±1 neighbours of the best supports per cardinality and retains only the beam_width best. |
Good coverage of the support lattice; second-best tracking for robustness. |
Default choice for moderate libraries (\(p \lesssim 100\)). |
|
Forward, backward, or bidirectional stepwise selection. Adds or drops one feature per step. |
Very fast; one support per cardinality. |
First pass / screening on large libraries. |
|
Stochastic hill-climbing with random add/remove moves (Gerardos & Ronceray, 2025). One chain per cardinality. |
Escapes greedy traps; works at every complexity level. |
Good complement to beam/greedy for difficult problems. |
|
Sequential Thresholded Least Squares (SINDy-style). Iteratively solves and thresholds small coefficients. |
Familiar to SINDy users; works well with clean data. |
Benchmarking against SINDy-style pipelines (see note). |
|
\(\ell_1\)-penalised coordinate descent on the normal equations. Sweeps a regularisation path. |
Continuous shrinkage; warm-started path is efficient. |
Benchmarking against LASSO pipelines (see note). |
Note
STLSQ and LASSO are for benchmarking, not the default.
These two are deterministic-regression sparsifiers, included as a convenience for comparison — not as a competitor to SINDy or LASSO toolchains. They come from the deterministic-dynamics literature, where the data is essentially noise-free. SFI’s own territory is stochastic dynamics, and there the principled choice is a beam or hill-climbing search under PASTIS, which is calibrated to the fluctuations of the residual.
SFI can sparsify a deterministic system, but on noise-free data its
error estimates and noise-calibrated criteria (PASTIS, the predicted
NMSE) lose their statistical meaning — there is no residual
distribution to calibrate against. Reach for STLSQ/LASSO when you
want a like-for-like comparison with deterministic methods, and
prefer "beam" or "hillclimb" with PASTIS for genuine
stochastic data.
Examples¶
# Beam search with wider beam
result = inf.sparsify_force(
criterion="PASTIS",
method="beam",
beam_width=10,
)
# Greedy forward selection
result = inf.sparsify_force(
criterion="AIC",
method="greedy",
direction="forward",
)
# STLSQ with automatic threshold sweep
result = inf.sparsify_force(
criterion="BIC",
method="stlsq",
n_thresholds=30,
)
# Stochastic hill-climbing (PASTIS paper algorithm)
result = inf.sparsify_force(
criterion="PASTIS",
method="hillclimb",
patience=200,
seed=42,
)
# LASSO path
result = inf.sparsify_force(
criterion="PASTIS",
method="lasso",
n_alphas=50,
)
Method-specific parameters¶
Beam search (method="beam")
Parameter |
Default |
Description |
|---|---|---|
|
3 |
Number of candidates retained per cardinality level. |
|
2 |
Stop early when AIC has declined for this many consecutive closed levels. |
Greedy stepwise (method="greedy")
Parameter |
Default |
Description |
|---|---|---|
|
|
|
STLSQ (method="stlsq")
Parameter |
Default |
Description |
|---|---|---|
|
None (auto sweep) |
Fixed threshold value, or |
|
|
|
|
30 |
Number of threshold values in the automatic sweep. |
|
50 |
Maximum STLSQ iterations per threshold. |
Hill-climbing (method="hillclimb")
Parameter |
Default |
Description |
|---|---|---|
|
|
Information criterion used as the acceptance objective. |
|
200 |
Stop a chain after this many consecutive rejected moves. |
|
None |
Random seed for reproducibility. |
|
1e-3 |
PASTIS significance level \(p_0\). |
|
None |
Total trajectory time (required when |
|
0.5 |
EBIC tuning parameter \(\gamma \in [0,1]\). |
LASSO (method="lasso")
Parameter |
Default |
Description |
|---|---|---|
|
None (auto path) |
Fixed \(\alpha\), or |
|
50 |
Number of \(\alpha\) values in the path. |
|
1000 |
Maximum coordinate-descent iterations per \(\alpha\). |
|
1e-7 |
Convergence tolerance for coordinate descent. |
Information criteria¶
After the search, a Pareto front of (cardinality, information gain) pairs is available. The information criterion trades model fit against complexity:
where \(\mathcal{I}(k)\) is the log-likelihood gain with \(k\) basis terms out of \(p\) candidates.
Criterion |
Penalty |
Use case |
|---|---|---|
|
\(k\) |
When \(n_0\) is small relative to data size (Akaike, 1974). |
|
\(\frac{1}{2}\,k\,\ln\tau\) |
Bayesian model comparison; requires the total trajectory time \(\tau\) (Schwarz, 1978; continuous-time form from Gerardos & Ronceray, 2025). |
|
\(\frac{1}{2}\,k\,\ln\tau + 2\gamma\,\ln\binom{n_0}{k}\) |
Extended BIC for large libraries; \(\gamma\!\in\![0,1]\) tunes the combinatorial penalty (Chen & Chen, 2008). Requires \(\tau\). |
|
\(k\,\ln(n_0 / p_0)\) |
Default for SFI. \(p_0\) is the significance level (default 1e-3) controlling overfitting tolerance (Gerardos & Ronceray, 2025). |
Note
BIC and EBIC require the total trajectory time \(\tau\).
Pass tau=... to select_by_ic() or all_ic(). When
tau is not provided, all_ic() includes only AIC and
PASTIS.
Tip
Which criterion? "PASTIS" is recommended for typical
experimental data. For very small bases, "AIC" or "BIC"
may be more appropriate since the PASTIS penalty scales as
\(\ln n_0\) and can be too harsh when \(n_0 < 10\). For
large libraries, "EBIC" with \(\gamma \approx 0.5\) adds a
combinatorial correction that improves selection consistency.
Inspecting the Pareto front¶
The SparsityResult exposes the full
Pareto front for plotting or further analysis:
import matplotlib.pyplot as plt
result = inf.sparsify_force(criterion="PASTIS", method="beam")
# Plot information gain vs. model size
ks = range(len(result.best_info_by_k))
plt.plot(ks, result.best_info_by_k, "o-")
plt.xlabel("Model size k")
plt.ylabel("Information gain I(k)")
plt.title("Pareto front")
# Overlay IC-selected model
k_star, support, score, coeffs = result.select_by_ic("PASTIS")
plt.axvline(k_star, color="r", ls="--", label=f"PASTIS: k*={k_star}")
plt.legend()
# Full IC comparison table (logged at INFO level)
summary = result.all_ic(verbose=True)
Benchmarking against ground truth¶
When the true support and coefficients are known (synthetic data):
from SFI.inference.sparse import overlap_metrics, predictive_nmse
om = overlap_metrics(true_support, support)
print(f"TP={om['TP']}, FP={om['FP']}, FN={om['FN']}, "
f"precision={om['prec']:.2f}, recall={om['rec']:.2f}")
# Predictive NMSE on held-out design matrix
nmse = predictive_nmse(Phi_test, true_support, true_coeffs,
support, coeffs)
print(f"Predictive NMSE: {nmse:.4f}")
The result.all_ic() method can optionally compute these metrics
for all criteria at once:
summary = result.all_ic(
true_support=true_support,
true_coeffs=true_coeffs,
Phi_test=Phi_test,
verbose=True,
)
Advanced: using strategies directly¶
For fine-grained control, instantiate a strategy and a
SparseScorer manually:
from SFI.inference.sparse import SparseScorer, BeamSearchStrategy
scorer = inf.force_scorer # created during infer_force_linear
strategy = BeamSearchStrategy(beam_width=20)
result = strategy.run(scorer, max_k=15)
# Or build a scorer from raw (M, G)
scorer = SparseScorer(M=M_vec, G=G_mat)
You can also write your own strategy by subclassing
SparsityStrategy and implementing
run(scorer, *, max_k) -> SparsityResult().
Troubleshooting¶
All coefficients are pruned by PASTIS.
The signal is too weak for the library size. Reduce the basis,
increase the trajectory length, or try criterion="AIC" which
has a weaker penalty.
Beam search is slow.
Reduce beam_width or max_k. For libraries with \(p > 50\),
consider using method="greedy" or method="lasso" as a fast
first pass.
STLSQ and LASSO disagree with beam search. This is expected. Different algorithms explore different parts of the support lattice. The beam search is the most thorough; STLSQ and LASSO provide complementary perspectives. Compare their Pareto fronts.
LASSO selects too many features.
Increase n_alphas for a finer regularisation path, or try
method="stlsq" which performs hard thresholding.
Further reading¶
Sparse model selection — Theory & design — mathematical background for the information criteria and search algorithms.