Sparse model selection¶

When the basis library is larger than the true model, many inferred coefficients should be zero. SFI’s sparse model selection module identifies which basis functions are genuinely needed and sets the rest to zero, yielding interpretable, parsimonious models with controlled overfitting.

Quick start¶

After running infer_force_linear(), call sparsify_force():

inf.infer_force_linear(basis)
result = inf.sparsify_force(criterion="PASTIS")

This performs a beam search (the default method) over all possible subsets of the basis, builds a Pareto front of information-gain vs. model size, and selects the best model according to the PASTIS information criterion.

The return value is a SparsityResult that holds the full Pareto front and can be queried for any criterion:

# Re-select with a different criterion without re-running the search
# (BIC and EBIC require the trajectory time tau)
k, support, score, coeffs = result.select_by_ic("BIC", tau=tau)

# Summary table of all criteria (tau enables BIC/EBIC)
result.all_ic(tau=tau, verbose=True)

Choosing a search method¶

Four algorithms are available via the method keyword. Each explores the combinatorial space of supports differently:

Method	Description	Strengths	When to use
`"beam"`	Bidirectional beam search (PASTIS original). Expands ±1 neighbours of the best supports per cardinality and retains only the beam_width best.	Good coverage of the support lattice; second-best tracking for robustness.	Default choice for moderate libraries (\(p \lesssim 100\)).
`"greedy"`	Forward, backward, or bidirectional stepwise selection. Adds or drops one feature per step.	Very fast; one support per cardinality.	First pass / screening on large libraries.
`"hillclimb"`	Stochastic hill-climbing with random add/remove moves (Gerardos & Ronceray, 2025). One chain per cardinality.	Escapes greedy traps; works at every complexity level.	Good complement to beam/greedy for difficult problems.
`"stlsq"`	Sequential Thresholded Least Squares (SINDy-style). Iteratively solves and thresholds small coefficients.	Familiar to SINDy users; works well with clean data.	Benchmarking against SINDy-style pipelines (see note).
`"lasso"`	\(\ell_1\)-penalised coordinate descent on the normal equations. Sweeps a regularisation path.	Continuous shrinkage; warm-started path is efficient.	Benchmarking against LASSO pipelines (see note).

Note

STLSQ and LASSO are for benchmarking, not the default.

These two are deterministic-regression sparsifiers, included as a convenience for comparison — not as a competitor to SINDy or LASSO toolchains. They come from the deterministic-dynamics literature, where the data is essentially noise-free. SFI’s own territory is stochastic dynamics, and there the principled choice is a beam or hill-climbing search under PASTIS, which is calibrated to the fluctuations of the residual.

SFI can sparsify a deterministic system, but on noise-free data its error estimates and noise-calibrated criteria (PASTIS, the predicted NMSE) lose their statistical meaning — there is no residual distribution to calibrate against. Reach for STLSQ/LASSO when you want a like-for-like comparison with deterministic methods, and prefer "beam" or "hillclimb" with PASTIS for genuine stochastic data.

Examples¶

# Beam search with wider beam
result = inf.sparsify_force(
    criterion="PASTIS",
    method="beam",
    beam_width=10,
)

# Greedy forward selection
result = inf.sparsify_force(
    criterion="AIC",
    method="greedy",
    direction="forward",
)

# STLSQ with automatic threshold sweep
result = inf.sparsify_force(
    criterion="BIC",
    method="stlsq",
    n_thresholds=30,
)

# Stochastic hill-climbing (PASTIS paper algorithm)
result = inf.sparsify_force(
    criterion="PASTIS",
    method="hillclimb",
    patience=200,
    seed=42,
)

# LASSO path
result = inf.sparsify_force(
    criterion="PASTIS",
    method="lasso",
    n_alphas=50,
)

Method-specific parameters¶

Beam search (method="beam")

Parameter	Default	Description
`beam_width`	3	Number of candidates retained per cardinality level.
`aic_patience`	2	Stop early when AIC has declined for this many consecutive closed levels.

Greedy stepwise (method="greedy")

Parameter	Default	Description
`direction`	`"forward"`	`"forward"`, `"backward"`, or `"both"` (merge both paths).

STLSQ (method="stlsq")

Parameter	Default	Description
`threshold`	None (auto sweep)	Fixed threshold value, or `None` for a log-spaced sweep.
`mode`	`"relative"`	`"relative"` (fraction of \(\max\|C\|\)) or `"absolute"` (fixed value).
`n_thresholds`	30	Number of threshold values in the automatic sweep.
`max_iter`	50	Maximum STLSQ iterations per threshold.

Hill-climbing (method="hillclimb")

Parameter	Default	Description
`ic`	`"PASTIS"`	Information criterion used as the acceptance objective.
`patience`	200	Stop a chain after this many consecutive rejected moves.
`seed`	None	Random seed for reproducibility.
`p_param`	1e-3	PASTIS significance level \(p_0\).
`tau`	None	Total trajectory time (required when `ic` is BIC or EBIC).
`gamma`	0.5	EBIC tuning parameter \(\gamma \in [0,1]\).

LASSO (method="lasso")

Parameter	Default	Description
`alpha`	None (auto path)	Fixed \(\alpha\), or `None` for a log-spaced path from \(\alpha_{\max}\) down to \(10^{-4}\alpha_{\max}\).
`n_alphas`	50	Number of \(\alpha\) values in the path.
`max_iter`	1000	Maximum coordinate-descent iterations per \(\alpha\).
`tol`	1e-7	Convergence tolerance for coordinate descent.

Information criteria¶

After the search, a Pareto front of (cardinality, information gain) pairs is available. The information criterion trades model fit against complexity:

\[\text{IC}(k) \;=\; \mathcal{I}(k) \;-\; \text{penalty}(k)\]

where \(\mathcal{I}(k)\) is the log-likelihood gain with \(k\) basis terms out of \(p\) candidates.

Criterion	Penalty	Use case
`"AIC"`	\(k\)	When \(n_0\) is small relative to data size (Akaike, 1974).
`"BIC"`	\(\frac{1}{2}\,k\,\ln\tau\)	Bayesian model comparison; requires the total trajectory time \(\tau\) (Schwarz, 1978; continuous-time form from Gerardos & Ronceray, 2025).
`"EBIC"`	\(\frac{1}{2}\,k\,\ln\tau + 2\gamma\,\ln\binom{n_0}{k}\)	Extended BIC for large libraries; \(\gamma\!\in\![0,1]\) tunes the combinatorial penalty (Chen & Chen, 2008). Requires \(\tau\).
`"PASTIS"`	\(k\,\ln(n_0 / p_0)\)	Default for SFI. \(p_0\) is the significance level (default 1e-3) controlling overfitting tolerance (Gerardos & Ronceray, 2025).

Note

BIC and EBIC require the total trajectory time \(\tau\). Pass tau=... to select_by_ic() or all_ic(). When tau is not provided, all_ic() includes only AIC and PASTIS.

Tip

Which criterion? "PASTIS" is recommended for typical experimental data. For very small bases, "AIC" or "BIC" may be more appropriate since the PASTIS penalty scales as \(\ln n_0\) and can be too harsh when \(n_0 < 10\). For large libraries, "EBIC" with \(\gamma \approx 0.5\) adds a combinatorial correction that improves selection consistency.

Inspecting the Pareto front¶

The SparsityResult exposes the full Pareto front for plotting or further analysis:

import matplotlib.pyplot as plt

result = inf.sparsify_force(criterion="PASTIS", method="beam")

# Plot information gain vs. model size
ks = range(len(result.best_info_by_k))
plt.plot(ks, result.best_info_by_k, "o-")
plt.xlabel("Model size k")
plt.ylabel("Information gain I(k)")
plt.title("Pareto front")

# Overlay IC-selected model
k_star, support, score, coeffs = result.select_by_ic("PASTIS")
plt.axvline(k_star, color="r", ls="--", label=f"PASTIS: k*={k_star}")
plt.legend()

# Full IC comparison table (logged at INFO level)
summary = result.all_ic(verbose=True)

Benchmarking against ground truth¶

When the true support and coefficients are known (synthetic data):

from SFI.inference.sparse import overlap_metrics, predictive_nmse

om = overlap_metrics(true_support, support)
print(f"TP={om['TP']}, FP={om['FP']}, FN={om['FN']}, "
      f"precision={om['prec']:.2f}, recall={om['rec']:.2f}")

# Predictive NMSE on held-out design matrix
nmse = predictive_nmse(Phi_test, true_support, true_coeffs,
                       support, coeffs)
print(f"Predictive NMSE: {nmse:.4f}")

The result.all_ic() method can optionally compute these metrics for all criteria at once:

summary = result.all_ic(
    true_support=true_support,
    true_coeffs=true_coeffs,
    Phi_test=Phi_test,
    verbose=True,
)

Advanced: using strategies directly¶

For fine-grained control, instantiate a strategy and a SparseScorer manually:

from SFI.inference.sparse import SparseScorer, BeamSearchStrategy

scorer = inf.force_scorer   # created during infer_force_linear
strategy = BeamSearchStrategy(beam_width=20)
result = strategy.run(scorer, max_k=15)

# Or build a scorer from raw (M, G)
scorer = SparseScorer(M=M_vec, G=G_mat)

You can also write your own strategy by subclassing SparsityStrategy and implementing run(scorer, *, max_k) -> SparsityResult().

Troubleshooting¶

All coefficients are pruned by PASTIS. The signal is too weak for the library size. Reduce the basis, increase the trajectory length, or try criterion="AIC" which has a weaker penalty.

Beam search is slow. Reduce beam_width or max_k. For libraries with \(p > 50\), consider using method="greedy" or method="lasso" as a fast first pass.

STLSQ and LASSO disagree with beam search. This is expected. Different algorithms explore different parts of the support lattice. The beam search is the most thorough; STLSQ and LASSO provide complementary perspectives. Compare their Pareto fronts.

LASSO selects too many features. Increase n_alphas for a finer regularisation path, or try method="stlsq" which performs hard thresholding.

Sparse model selection¶

Quick start¶

Choosing a search method¶

Examples¶

Method-specific parameters¶

Information criteria¶

Inspecting the Pareto front¶

Benchmarking against ground truth¶

Advanced: using strategies directly¶

Troubleshooting¶

Further reading¶