Performance and scalability¶

This page covers the computational characteristics of interlace: time and memory complexity, practical size limits, and guidance on when the optional CHOLMOD and BOBYQA extras improve performance.

For installation instructions and the decision of whether to install these extras, see FAQ: solver choice and FAQ: optimizer choice.

How fitting works¶

interlace.fit() uses profiled REML: the fixed effects are eliminated analytically, reducing the optimisation to a search over variance parameters only. For a model with K grouping factors the search space has K + 1 parameters (one variance per factor plus the residual variance), regardless of how many observations or group levels the data contain.

Each evaluation of the REML criterion requires:

Forming the random-effect design matrix Z (sparse, n × Σᵢ gᵢ where gᵢ is the number of levels in factor i).
Factorising the updated covariance matrix — this is the dominant cost.
Solving a linear system to recover the fixed effects and BLUPs.

The number of REML evaluations is typically 30–150 for the default L-BFGS-B optimizer and 50–200 for BOBYQA.

Time complexity¶

Step	Dense path (default)	Sparse path (CHOLMOD)
Covariance factorisation	O(G³) per iteration	O(G · nnz) per iteration¹
Fixed-effect solve	O(p² · n)	O(p² · n)
BLUP recovery	O(G · p)	O(G · p)

G = Σᵢ gᵢ is the total number of random-effect levels across all grouping factors. nnz is the number of non-zeros in the sparse factor.

Key implication: fitting time grows cubically with the total number of group levels on the dense path. Doubling the number of groups (e.g. from 500 to 1 000) increases factorisation cost roughly 8×. CHOLMOD reduces this to near-linear in the non-zero structure of the problem.

¹ After symbolic analysis (done once), each numeric refactorisation reuses the sparsity pattern, so the per-iteration cost is proportional to the fill-in of the factor, not G³.

Memory complexity¶

Object	Size
Random-effect design matrix `Z`	O(n · K) — sparse, one non-zero per observation per factor
Dense covariance matrix	O(G²)
Sparse Cholesky factor	O(G · fill-in) — typically O(G log G) for crossed designs

For most practical models Z is very sparse and fits easily in memory. The dense covariance matrix becomes the bottleneck around G ≈ 5 000 group levels (≈ 200 MB for float64); CHOLMOD reduces this significantly.

Practical size limits¶

The table below gives indicative fitting times measured on a laptop-class machine (Apple M-series, single thread). Actual times vary with observation count, number of predictors, and convergence behaviour.

Observations (n)	Total group levels (G)	Dense path	CHOLMOD path
1 000	50	< 1 s	< 1 s
10 000	200	1–3 s	1–2 s
50 000	500	5–15 s	3–8 s
100 000	1 000	30–90 s	8–20 s
500 000	5 000	minutes	30–90 s
1 000 000	10 000	not practical	2–5 min

These are rough orders of magnitude. Models with more fixed-effect predictors or more variance components (more grouping factors) will be slower; well-conditioned models with few iterations will be faster.

Rule of thumb: if G > 500, install CHOLMOD. If n > 100 000, profile your specific model before deploying in a pipeline.

When CHOLMOD helps most¶

CHOLMOD’s advantage comes from exploiting the sparsity of the crossed random-effect structure. The benefit is largest when:

G is large — hundreds or thousands of group levels. Below ~200 total levels the overhead of sparse bookkeeping can outweigh the savings.
The design is unbalanced or sparse — many (group level i, group level j) combinations are unobserved. A fully balanced design (every combination observed equally) produces a denser factor and narrows the gap.
You are refitting the same model structure repeatedly — e.g. a bootstrap loop or a grid search over fixed-effect specifications. CHOLMOD amortises the symbolic analysis cost across fits.

CHOLMOD provides little benefit when:

G < 200 (dense path is fast enough).
The design is perfectly balanced (dense factor, minimal sparsity gain).
The environment makes compiling SuiteSparse difficult (e.g. restricted CI images).

When BOBYQA helps most¶

BOBYQA (gradient-free) does not directly speed up fitting — it typically uses more REML evaluations than L-BFGS-B. Its advantage is robustness, not raw speed:

Models where variance components approach zero (the gradient becomes unreliable near the boundary of the parameter space).
Models that fail to converge with L-BFGS-B, requiring retries or manual rescaling.
Matching lme4’s default algorithm for numerical parity.

If L-BFGS-B converges cleanly, it is faster. Switch to BOBYQA when convergence is the problem, not fitting time.

Reducing fitting time in practice¶

1. Standardise predictors¶

The REML surface is more spherical when predictors are on similar scales. Standardising continuous predictors typically reduces iteration counts by 30–50%:

from sklearn.preprocessing import StandardScaler

df["x_scaled"] = StandardScaler().fit_transform(df[["x"]])
model = interlace.fit("y ~ x_scaled", data=df, groups="firm")

2. Install CHOLMOD for large models¶

pip install "interlace-lme[cholmod]"

No code change required — interlace detects and uses CHOLMOD automatically.

3. Reduce the number of variance parameters¶

Each additional grouping factor adds one variance parameter and increases G. If a factor has a near-zero variance component (check result.variance_components), consider dropping it:

# Check whether a grouping factor contributes
for g, v in result.variance_components.items():
    icc = v / (sum(result.variance_components.values()) + result.scale)
    print(f"{g}: ICC = {icc:.3f}")
# If ICC < 0.01, the factor may not be worth the computational cost

4. Fit with ML for model comparison, then refit with REML¶

When comparing fixed-effect structures via likelihood-ratio tests, fit with method="ML" (faster for multiple fits) and do a final method="REML" fit for reported estimates:

# Fast comparison fits
m1 = interlace.fit("y ~ x1",      data=df, groups="g", method="ML")
m2 = interlace.fit("y ~ x1 + x2", data=df, groups="g", method="ML")

# Final reported model
best = interlace.fit("y ~ x1 + x2", data=df, groups="g", method="REML")

5. Profile before optimising¶

Use Python’s time module or cProfile to find where fitting time goes before installing extras or restructuring data:

import time

t0 = time.perf_counter()
result = interlace.fit("y ~ x", data=df, groups=["g1", "g2"])
print(f"Fit took {time.perf_counter() - t0:.2f}s")

Known limitations¶

Single-threaded — interlace does not currently parallelise the REML optimisation. Parallelism is available at the model level: fitting multiple independent models concurrently with concurrent.futures or similar.
In-memory only — the full dataset must fit in RAM. For very large datasets, consider pre-aggregating or sampling before fitting.
Two or three grouping factors — while the API accepts any number of groups, practical fitting with more than three crossed factors is largely untested and may be slow or numerically unstable.
Balanced vs unbalanced — highly unbalanced designs (some groups with 1–2 observations) increase the condition number of the covariance matrix and may require BOBYQA or predictor rescaling for reliable convergence.

Where to go next¶

FAQ — convergence troubleshooting, solver and optimizer switching
Installation — installing CHOLMOD and BOBYQA extras (includes decision guide)
Model Comparison Guide — ML vs REML, fitting multiple models efficiently
Feature comparison — how interlace compares to statsmodels and lme4 on supported structures