Performance and scalability

This page covers the computational characteristics of interlace: time and memory complexity, practical size limits, and guidance on when the optional CHOLMOD and BOBYQA extras improve performance.

For installation instructions and the decision of whether to install these extras, see FAQ: solver choice and FAQ: optimizer choice.


How fitting works

interlace.fit() uses profiled REML: the fixed effects are eliminated analytically, reducing the optimisation to a search over variance parameters only. For a model with K grouping factors the search space has K + 1 parameters (one variance per factor plus the residual variance), regardless of how many observations or group levels the data contain.

Each evaluation of the REML criterion requires:

  1. Forming the random-effect design matrix Z (sparse, n × Σᵢ gᵢ where gᵢ is the number of levels in factor i).

  2. Factorising the updated covariance matrix — this is the dominant cost.

  3. Solving a linear system to recover the fixed effects and BLUPs.

The number of REML evaluations is typically 30–150 for the default L-BFGS-B optimizer and 50–200 for BOBYQA.


Time complexity

Step

Dense path (default)

Sparse path (CHOLMOD)

Covariance factorisation

O(G³) per iteration

O(G · nnz) per iteration¹

Fixed-effect solve

O(p² · n)

O(p² · n)

BLUP recovery

O(G · p)

O(G · p)

G = Σᵢ gᵢ is the total number of random-effect levels across all grouping factors. nnz is the number of non-zeros in the sparse factor.

Key implication: fitting time grows cubically with the total number of group levels on the dense path. Doubling the number of groups (e.g. from 500 to 1 000) increases factorisation cost roughly 8×. CHOLMOD reduces this to near-linear in the non-zero structure of the problem.

¹ After symbolic analysis (done once), each numeric refactorisation reuses the sparsity pattern, so the per-iteration cost is proportional to the fill-in of the factor, not G³.


Memory complexity

Object

Size

Random-effect design matrix Z

O(n · K) — sparse, one non-zero per observation per factor

Dense covariance matrix

O(G²)

Sparse Cholesky factor

O(G · fill-in) — typically O(G log G) for crossed designs

For most practical models Z is very sparse and fits easily in memory. The dense covariance matrix becomes the bottleneck around G ≈ 5 000 group levels (≈ 200 MB for float64); CHOLMOD reduces this significantly.


Practical size limits

The table below gives indicative fitting times measured on a laptop-class machine (Apple M-series, single thread). Actual times vary with observation count, number of predictors, and convergence behaviour.

Observations (n)

Total group levels (G)

Dense path

CHOLMOD path

1 000

50

< 1 s

< 1 s

10 000

200

1–3 s

1–2 s

50 000

500

5–15 s

3–8 s

100 000

1 000

30–90 s

8–20 s

500 000

5 000

minutes

30–90 s

1 000 000

10 000

not practical

2–5 min

These are rough orders of magnitude. Models with more fixed-effect predictors or more variance components (more grouping factors) will be slower; well-conditioned models with few iterations will be faster.

Rule of thumb: if G > 500, install CHOLMOD. If n > 100 000, profile your specific model before deploying in a pipeline.


When CHOLMOD helps most

CHOLMOD’s advantage comes from exploiting the sparsity of the crossed random-effect structure. The benefit is largest when:

  • G is large — hundreds or thousands of group levels. Below ~200 total levels the overhead of sparse bookkeeping can outweigh the savings.

  • The design is unbalanced or sparse — many (group level i, group level j) combinations are unobserved. A fully balanced design (every combination observed equally) produces a denser factor and narrows the gap.

  • You are refitting the same model structure repeatedly — e.g. a bootstrap loop or a grid search over fixed-effect specifications. CHOLMOD amortises the symbolic analysis cost across fits.

CHOLMOD provides little benefit when:

  • G < 200 (dense path is fast enough).

  • The design is perfectly balanced (dense factor, minimal sparsity gain).

  • The environment makes compiling SuiteSparse difficult (e.g. restricted CI images).


When BOBYQA helps most

BOBYQA (gradient-free) does not directly speed up fitting — it typically uses more REML evaluations than L-BFGS-B. Its advantage is robustness, not raw speed:

  • Models where variance components approach zero (the gradient becomes unreliable near the boundary of the parameter space).

  • Models that fail to converge with L-BFGS-B, requiring retries or manual rescaling.

  • Matching lme4’s default algorithm for numerical parity.

If L-BFGS-B converges cleanly, it is faster. Switch to BOBYQA when convergence is the problem, not fitting time.


Reducing fitting time in practice

1. Standardise predictors

The REML surface is more spherical when predictors are on similar scales. Standardising continuous predictors typically reduces iteration counts by 30–50%:

from sklearn.preprocessing import StandardScaler

df["x_scaled"] = StandardScaler().fit_transform(df[["x"]])
model = interlace.fit("y ~ x_scaled", data=df, groups="firm")

2. Install CHOLMOD for large models

pip install "interlace-lme[cholmod]"

No code change required — interlace detects and uses CHOLMOD automatically.

3. Reduce the number of variance parameters

Each additional grouping factor adds one variance parameter and increases G. If a factor has a near-zero variance component (check result.variance_components), consider dropping it:

# Check whether a grouping factor contributes
for g, v in result.variance_components.items():
    icc = v / (sum(result.variance_components.values()) + result.scale)
    print(f"{g}: ICC = {icc:.3f}")
# If ICC < 0.01, the factor may not be worth the computational cost

4. Fit with ML for model comparison, then refit with REML

When comparing fixed-effect structures via likelihood-ratio tests, fit with method="ML" (faster for multiple fits) and do a final method="REML" fit for reported estimates:

# Fast comparison fits
m1 = interlace.fit("y ~ x1",      data=df, groups="g", method="ML")
m2 = interlace.fit("y ~ x1 + x2", data=df, groups="g", method="ML")

# Final reported model
best = interlace.fit("y ~ x1 + x2", data=df, groups="g", method="REML")

5. Profile before optimising

Use Python’s time module or cProfile to find where fitting time goes before installing extras or restructuring data:

import time

t0 = time.perf_counter()
result = interlace.fit("y ~ x", data=df, groups=["g1", "g2"])
print(f"Fit took {time.perf_counter() - t0:.2f}s")

Known limitations

  • Single-threadedinterlace does not currently parallelise the REML optimisation. Parallelism is available at the model level: fitting multiple independent models concurrently with concurrent.futures or similar.

  • In-memory only — the full dataset must fit in RAM. For very large datasets, consider pre-aggregating or sampling before fitting.

  • Two or three grouping factors — while the API accepts any number of groups, practical fitting with more than three crossed factors is largely untested and may be slow or numerically unstable.

  • Balanced vs unbalanced — highly unbalanced designs (some groups with 1–2 observations) increase the condition number of the covariance matrix and may require BOBYQA or predictor rescaling for reliable convergence.


Where to go next

  • FAQ — convergence troubleshooting, solver and optimizer switching

  • Installation — installing CHOLMOD and BOBYQA extras (includes decision guide)

  • Model Comparison Guide — ML vs REML, fitting multiple models efficiently

  • Feature comparison — how interlace compares to statsmodels and lme4 on supported structures