SFI.inference.sparse.result module¶

SFI.inference.sparse.result — Sparsity result container¶

SparsityResult is the return type of every search strategy. It stores the Pareto front (best info / support / coefficients per cardinality k) and provides information-criterion selection.

Supported information criteria¶

AIC — Akaike (1974), penalty k.
BIC — Schwarz (1978), penalty (k/2) ln τ. Uses the continuous-time formulation of Gerardos & Ronceray (2025).
EBIC — Chen & Chen (2008), BIC + 2 γ ln C(n₀, k).
PASTIS — Gerardos & Ronceray (2025), penalty k ln(n₀/p₀).
SIC — Secret Information Criterion (unpublished, Ronceray), penalty k ln(I_total).

class SFI.inference.sparse.result.SparsityResult(p, total_info, method, best_info_by_k=<factory>, best_support_by_k=<factory>, best_coeffs_by_k=<factory>, second_info_by_k=<factory>, second_support_by_k=<factory>)[source]¶

Bases: object

Frozen container for the output of a sparsity search.

Variables:

p (int) – Total number of candidate basis functions.
total_info (float) – Information gain of the full (dense) model.
method (str) – Name of the strategy that produced this result (e.g. "beam", "greedy", "stlsq", "lasso").
best_info_by_k (list[float]) – best_info_by_k[k] is the highest information gain found among all explored supports of cardinality k. Unexplored cardinalities are -inf.
best_support_by_k (list[list[int]]) – The support achieving best_info_by_k[k].
best_coeffs_by_k (list[Array | None]) – The corresponding coefficient vector.
second_info_by_k (list[float]) – Second-best information gain per k (for robustness diagnostics). May be all -inf if the strategy does not track runner-ups.
second_support_by_k (list[list[int]]) – Support achieving the second-best info per k.

Parameters:

p (int)
total_info (float)
method (str)
best_info_by_k (list)
best_support_by_k (list)
best_coeffs_by_k (list)
second_info_by_k (list)
second_support_by_k (list)

all_ic(*, p_param=0.001, tau=None, gamma=0.5, true_support=None, true_coeffs=None, Phi_test=None, verbose=True)[source]¶

Compute all information criteria and optionally compare to ground truth.

Parameters:

p_param (float) – PASTIS significance level.
tau (float or None) – Total trajectory time. If provided, BIC and EBIC are included; otherwise they are skipped.
gamma (float, default 0.5) – EBIC tuning parameter.
true_support (optional) – Ground-truth support and coefficients for overlap metrics.
true_coeffs (optional) – Ground-truth support and coefficients for overlap metrics.
Phi_test (optional Array) – Held-out design matrix for predictive NMSE.
verbose (bool) – If True, log a summary table at INFO level.

Returns:

Keyed by IC name, each value is a dict with k, support, score, coeffs, and optionally overlap and predictive-NMSE entries.

Return type:

dict

best_coeffs_by_k: list¶

best_info_by_k: list¶

best_support_by_k: list¶

method: str¶

p: int¶

second_info_by_k: list¶

second_support_by_k: list¶

select_by_ic(name, *, p_param=0.001, tau=None, gamma=0.5)[source]¶

Return the support that maximises a given information criterion.

[Model selection] Information criteria for sparse model selection

\[\begin{split}\text{AIC}(k) &= \mathcal{I}(k) - k \\ \text{BIC}(k) &= \mathcal{I}(k) - \tfrac{1}{2}\,k\,\ln\tau \\ \text{EBIC}(k) &= \text{BIC}(k) - 2\gamma\,\ln\binom{n_0}{k} \\ \text{PASTIS}(k) &= \mathcal{I}(k) - k\,\ln(n_0 / p_0) \\ \text{SIC}(k) &= \mathcal{I}(k) - k\,\ln(\mathcal{I}_{\text{total}})\end{split}\]

where \(\mathcal{I}(k)\) is the log-likelihood gain with k basis terms out of \(n_0\) candidates, \(\tau\) is the total trajectory time, \(p_0\) is the PASTIS significance level, and \(\gamma \in [0,1]\) controls EBIC stringency.

References

AIC — Akaike, H. (1974). “A new look at the statistical model identification.” IEEE Trans. Automat. Control, 19(6), 716–723.
BIC — Schwarz, G. (1978). “Estimating the dimension of a model.” Ann. Statist., 6(2), 461–464. The continuous-time formulation \(\tfrac{k}{2}\ln\tau\) follows from the Laplace approximation of the SDE marginal likelihood (Gerardos & Ronceray, 2025).
EBIC — Chen, J. & Chen, Z. (2008). “Extended Bayesian information criteria for model selection with large model spaces.” Biometrika, 95(3), 759–771.
PASTIS — Gerardos, A. & Ronceray, P. (2025). “Principled model selection for stochastic dynamics.”
SIC — Unpublished (Ronceray).

Parameters:

name ("AIC" | "BIC" | "EBIC" | "PASTIS" | "SIC") – Information criterion to maximise.
p_param (float, default 1e-3) – Significance level \(p_0\) for the PASTIS penalty.
tau (float or None) – Total trajectory time. Required for BIC and EBIC.
gamma (float, default 0.5) – EBIC tuning parameter (\(\gamma \in [0,1]\)). Only used when name is "EBIC".

Returns:

k_star (int) – Selected model size.
support (list[int]) – Basis-function indices of the chosen model.
score (float) – Value of the information criterion at k_star.
coeffs (Array or None) – Coefficient vector for the selected support.

Return type:

Tuple[int, List[int], float, Array | None]

total_info: float¶