# Performance and scalability

This page covers the computational characteristics of `interlace`: time and memory
complexity, practical size limits, and guidance on when the optional CHOLMOD and BOBYQA
extras improve performance.

For installation instructions and the decision of *whether* to install these extras,
see [FAQ: solver choice](faq.md#solver-choice-cholmod-vs-default) and
[FAQ: optimizer choice](faq.md#optimizer-choice-bobyqa-vs-default).

---

## How fitting works

`interlace.fit()` uses **profiled REML**: the fixed effects are eliminated
analytically, reducing the optimisation to a search over variance parameters only. For
a model with `K` grouping factors the search space has `K + 1` parameters (one
variance per factor plus the residual variance), regardless of how many observations or
group levels the data contain.

Each evaluation of the REML criterion requires:

1. Forming the random-effect design matrix `Z` (sparse, `n × Σᵢ gᵢ` where `gᵢ` is the
   number of levels in factor `i`).
2. Factorising the updated covariance matrix — this is the dominant cost.
3. Solving a linear system to recover the fixed effects and BLUPs.

The number of REML evaluations is typically 30–150 for the default L-BFGS-B optimizer
and 50–200 for BOBYQA.

---

## Time complexity

| Step | Dense path (default) | Sparse path (CHOLMOD) |
|------|---------------------|-----------------------|
| Covariance factorisation | O(G³) per iteration | O(G · nnz) per iteration¹ |
| Fixed-effect solve | O(p² · n) | O(p² · n) |
| BLUP recovery | O(G · p) | O(G · p) |

`G = Σᵢ gᵢ` is the total number of random-effect levels across all grouping factors.
`nnz` is the number of non-zeros in the sparse factor.

**Key implication:** fitting time grows cubically with the total number of group levels
on the dense path. Doubling the number of groups (e.g. from 500 to 1 000) increases
factorisation cost roughly 8×. CHOLMOD reduces this to near-linear in the non-zero
structure of the problem.

¹ After symbolic analysis (done once), each numeric refactorisation reuses the sparsity
pattern, so the per-iteration cost is proportional to the fill-in of the factor, not G³.

---

## Memory complexity

| Object | Size |
|--------|------|
| Random-effect design matrix `Z` | O(n · K) — sparse, one non-zero per observation per factor |
| Dense covariance matrix | O(G²) |
| Sparse Cholesky factor | O(G · fill-in) — typically O(G log G) for crossed designs |

For most practical models `Z` is very sparse and fits easily in memory. The dense
covariance matrix becomes the bottleneck around G ≈ 5 000 group levels (≈ 200 MB for
float64); CHOLMOD reduces this significantly.

---

## Practical size limits

The table below gives indicative fitting times measured on a laptop-class machine
(Apple M-series, single thread). Actual times vary with observation count, number of
predictors, and convergence behaviour.

| Observations (n) | Total group levels (G) | Dense path | CHOLMOD path |
|:----------------:|:----------------------:|:----------:|:------------:|
| 1 000 | 50 | < 1 s | < 1 s |
| 10 000 | 200 | 1–3 s | 1–2 s |
| 50 000 | 500 | 5–15 s | 3–8 s |
| 100 000 | 1 000 | 30–90 s | 8–20 s |
| 500 000 | 5 000 | minutes | 30–90 s |
| 1 000 000 | 10 000 | not practical | 2–5 min |

These are rough orders of magnitude. Models with more fixed-effect predictors or more
variance components (more grouping factors) will be slower; well-conditioned models with
few iterations will be faster.

**Rule of thumb:** if G > 500, install CHOLMOD. If n > 100 000, profile your specific
model before deploying in a pipeline.

---

## When CHOLMOD helps most

CHOLMOD's advantage comes from exploiting the **sparsity of the crossed random-effect
structure**. The benefit is largest when:

- **G is large** — hundreds or thousands of group levels. Below ~200 total levels the
  overhead of sparse bookkeeping can outweigh the savings.
- **The design is unbalanced or sparse** — many (group level i, group level j)
  combinations are unobserved. A fully balanced design (every combination observed
  equally) produces a denser factor and narrows the gap.
- **You are refitting the same model structure repeatedly** — e.g. a bootstrap loop or
  a grid search over fixed-effect specifications. CHOLMOD amortises the symbolic
  analysis cost across fits.

CHOLMOD provides little benefit when:

- G < 200 (dense path is fast enough).
- The design is perfectly balanced (dense factor, minimal sparsity gain).
- The environment makes compiling SuiteSparse difficult (e.g. restricted CI images).

---

## When BOBYQA helps most

BOBYQA (gradient-free) does not directly speed up fitting — it typically uses *more*
REML evaluations than L-BFGS-B. Its advantage is **robustness**, not raw speed:

- Models where variance components approach zero (the gradient becomes unreliable near
  the boundary of the parameter space).
- Models that fail to converge with L-BFGS-B, requiring retries or manual rescaling.
- Matching lme4's default algorithm for numerical parity.

If L-BFGS-B converges cleanly, it is faster. Switch to BOBYQA when convergence is the
problem, not fitting time.

---

## Reducing fitting time in practice

### 1. Standardise predictors

The REML surface is more spherical when predictors are on similar scales. Standardising
continuous predictors typically reduces iteration counts by 30–50%:

```python
from sklearn.preprocessing import StandardScaler

df["x_scaled"] = StandardScaler().fit_transform(df[["x"]])
model = interlace.fit("y ~ x_scaled", data=df, groups="firm")
```

### 2. Install CHOLMOD for large models

```bash
pip install "interlace-lme[cholmod]"
```

No code change required — `interlace` detects and uses CHOLMOD automatically.

### 3. Reduce the number of variance parameters

Each additional grouping factor adds one variance parameter and increases G. If a
factor has a near-zero variance component (check `result.variance_components`), consider
dropping it:

```python
# Check whether a grouping factor contributes
for g, v in result.variance_components.items():
    icc = v / (sum(result.variance_components.values()) + result.scale)
    print(f"{g}: ICC = {icc:.3f}")
# If ICC < 0.01, the factor may not be worth the computational cost
```

### 4. Fit with ML for model comparison, then refit with REML

When comparing fixed-effect structures via likelihood-ratio tests, fit with `method="ML"`
(faster for multiple fits) and do a final `method="REML"` fit for reported estimates:

```python
# Fast comparison fits
m1 = interlace.fit("y ~ x1",      data=df, groups="g", method="ML")
m2 = interlace.fit("y ~ x1 + x2", data=df, groups="g", method="ML")

# Final reported model
best = interlace.fit("y ~ x1 + x2", data=df, groups="g", method="REML")
```

### 5. Profile before optimising

Use Python's `time` module or `cProfile` to find where fitting time goes before
installing extras or restructuring data:

```python
import time

t0 = time.perf_counter()
result = interlace.fit("y ~ x", data=df, groups=["g1", "g2"])
print(f"Fit took {time.perf_counter() - t0:.2f}s")
```

---

## Known limitations

- **Single-threaded** — `interlace` does not currently parallelise the REML
  optimisation. Parallelism is available at the model level: fitting multiple
  independent models concurrently with `concurrent.futures` or similar.
- **In-memory only** — the full dataset must fit in RAM. For very large datasets,
  consider pre-aggregating or sampling before fitting.
- **Two or three grouping factors** — while the API accepts any number of `groups`,
  practical fitting with more than three crossed factors is largely untested and may be
  slow or numerically unstable.
- **Balanced vs unbalanced** — highly unbalanced designs (some groups with 1–2
  observations) increase the condition number of the covariance matrix and may require
  BOBYQA or predictor rescaling for reliable convergence.

---

## Where to go next

- [FAQ](faq.md) — convergence troubleshooting, solver and optimizer switching
- [Installation](installation.md) — installing CHOLMOD and BOBYQA extras (includes decision guide)
- [Model Comparison Guide](model-comparison.md) — ML vs REML, fitting multiple models efficiently
- [Feature comparison](comparison.md) — how interlace compares to statsmodels and lme4 on supported structures