# Simulation and Bootstrap Guide

`interlace` provides two functions for simulation-based inference on fitted
mixed models:

- **`simulate()`** — draw new response vectors from the fitted model,
  using the same random-effect structure and parameter estimates
- **`bootMer()`** — run a full parametric bootstrap: simulate, refit, and
  collect any statistic you define

Together these cover three common workflows: **bootstrap standard errors and
confidence intervals**, **power analysis**, and **simulation-based model
checking** (posterior predictive checks).

---

## `simulate()` — drawing from the fitted model

`simulate()` draws response vectors from the fitted conditional model:

```
y* = X β + Z u* + ε*
```

where `u* ~ N(0, σ² ΛΛᵀ)` and `ε* ~ N(0, σ² Iₙ)`.

```python
import interlace
import numpy as np

result = interlace.fit("score ~ time + treatment", data=df, groups="subject")

# Draw 500 simulated response vectors (shape: n × 500)
y_sim = interlace.simulate(result, nsim=500, seed=42)
```

The return value is an `(n, nsim)` array — each column is one draw from the
model's predictive distribution. A single draw (`nsim=1`) gives an `(n, 1)`
array.

### When to use `simulate()` directly

`simulate()` is the building block for the two higher-level workflows below,
but it is also useful on its own when you want full control over the
simulated draws — for example, to compute a custom summary across replicates
without the overhead of refitting the model.

---

## Bootstrap standard errors and CIs with `bootMer()`

`bootMer()` runs a parametric bootstrap: for each of `B` replicates it
simulates `y*`, refits the model on the simulated data, and records a
statistic vector. The default statistic collects
`[fixed effects..., σ, θ...]`.

```python
boot = interlace.bootMer(result, B=500, seed=42, show_progress=True)

# Boot SEs for each parameter
se = boot.estimates.std(axis=0)
print(se)

# 95% percentile CIs (shape: p × 2, columns are [lower, upper])
ci = boot.ci(level=0.95)
print(ci)
```

`boot.estimates` has shape `(B, p)` where `p` is the length of the statistic
vector.

### Custom statistic

Pass any callable `(CrossedLMEResult) → np.ndarray` as the `statistic`
argument:

```python
def fixed_effects_only(r):
    return np.asarray(r.fe_params)

boot = interlace.bootMer(result, statistic=fixed_effects_only, B=500, seed=42)

# Columns: one per fixed-effect coefficient
print(boot.estimates.shape)       # (500, p)
print(boot.ci())                  # (p, 2)
```

### Bootstrap CIs vs Wald CIs

Bootstrap CIs are asymmetry-aware: if the sampling distribution of a
variance component is right-skewed, the percentile CI will correctly be
wider on the upper end. Wald CIs from `result.conf_int()` assume
approximate normality, which is reasonable for fixed effects but can be
misleading for variance components and their ratios.

Use `bootMer()` when:

- Reporting CIs for **variance components or ICC** (right-skewed sampling
  distributions)
- The sample is small and normality of the fixed-effect estimators is
  in doubt
- You want a **custom statistic** with no closed-form SE (e.g. a ratio of
  coefficients, a predicted difference between two groups)

---

## Power analysis

The parametric bootstrap is the standard approach for power analysis in
mixed models, because closed-form power formulas assume balanced designs
and ignore the variability introduced by estimating random effects.

The recipe:

1. Specify the data-generating model (effect sizes and variance components).
2. Simulate a dataset from that model.
3. Fit the model and test the effect of interest.
4. Repeat many times and count how often the test is significant.

```python
import numpy as np
import pandas as pd
import interlace

rng = np.random.default_rng(2024)

def power_lmm(
    n_subjects: int,
    n_obs: int,
    beta_treatment: float,
    sigma_subject: float = 1.0,
    sigma_resid: float = 1.0,
    B: int = 500,
    alpha: float = 0.05,
) -> float:
    """Estimated power for a two-group treatment effect in a random-intercept LMM."""
    n = n_subjects * n_obs
    significant = 0

    for _ in range(B):
        # Simulate data from the assumed model
        subject = np.repeat(np.arange(n_subjects), n_obs)
        treatment = (subject % 2).astype(float)          # half treated
        u = rng.normal(0, sigma_subject, n_subjects)     # random intercepts
        eps = rng.normal(0, sigma_resid, n)
        y = beta_treatment * treatment + u[subject] + eps

        df = pd.DataFrame({"y": y, "treatment": treatment, "subject": subject})
        result = interlace.fit("y ~ treatment", data=df, groups="subject")

        # Test: is the treatment effect significant at level alpha?
        pval = result.pvalues["treatment"]
        if pval < alpha:
            significant += 1

    return significant / B


power = power_lmm(
    n_subjects=40,
    n_obs=5,
    beta_treatment=0.5,
)
print(f"Estimated power: {power:.2f}")
```

### Varying sample size

Wrap the function in a grid search to find the required `n`:

```python
results = {}
for n_subjects in [20, 30, 40, 60, 80]:
    results[n_subjects] = power_lmm(n_subjects=n_subjects, n_obs=5,
                                    beta_treatment=0.5, B=200)
    print(f"n={n_subjects}: power={results[n_subjects]:.2f}")
```

### Tips for power analysis

- **Use realistic variance components.** Power is very sensitive to the
  ICC (ratio of between-subject variance to total variance). If you have
  pilot data, plug its variance component estimates in rather than
  guessing.
- **Match your analysis model to your data-generating model.** If the true
  model has random slopes but you power for random intercepts only, your
  power estimate will be optimistic.
- **Report the assumed effect size and variance.** Power results are only
  interpretable relative to the assumed parameters.

---

## Simulation-based model checking

A posterior predictive check asks: do the data look like they could have
come from the fitted model? This is a useful complement to residual plots,
especially for catching systematic misfit (e.g. overdispersion, heavy
tails, floor effects).

### Checking distributional assumptions

```python
import matplotlib.pyplot as plt

result = interlace.fit("score ~ time + treatment", data=df, groups="subject")

# Draw 200 simulated response vectors
y_sim = interlace.simulate(result, nsim=200, seed=0)

# Overlay observed and simulated densities
fig, ax = plt.subplots()
for j in range(y_sim.shape[1]):
    ax.hist(y_sim[:, j], bins=30, alpha=0.02, color="steelblue", density=True)
ax.hist(df["score"], bins=30, alpha=0.7, color="black", density=True, label="Observed")
ax.set_xlabel("score")
ax.legend()
plt.show()
```

If the observed histogram consistently falls in the tails of the simulated
distribution, the model is misspecified in some way (e.g., skewed
residuals, outliers, incorrect link function for a GLMM).

### Checking group-level summaries

```python
# Compare observed vs simulated group means
observed_group_means = df.groupby("subject")["score"].mean().values

sim_group_means = np.array([
    df.assign(score=y_sim[:, j]).groupby("subject")["score"].mean().values
    for j in range(y_sim.shape[1])
])  # shape: (nsim, n_subjects)

# 95% predictive interval for each group's mean
lo = np.percentile(sim_group_means, 2.5, axis=0)
hi = np.percentile(sim_group_means, 97.5, axis=0)

# How many observed group means fall outside the interval?
outside = np.mean((observed_group_means < lo) | (observed_group_means > hi))
print(f"Fraction of groups outside 95% PI: {outside:.2f}")
```

A well-calibrated model should have roughly 5% of group means outside the
95% predictive interval. Substantially more than 5% suggests the random
effect variance is underestimated.

### Checking a scalar summary statistic (T-test)

For a single summary statistic `T(y)`, you can compute a simulation-based
p-value:

```python
# Example: kurtosis of the residuals
from scipy.stats import kurtosis

y_obs = np.asarray(df["score"])
T_obs = kurtosis(y_obs - result.fittedvalues)

T_sim = np.array([
    kurtosis(y_sim[:, j] - result.fittedvalues)
    for j in range(y_sim.shape[1])
])

p_ppc = np.mean(np.abs(T_sim) >= np.abs(T_obs))
print(f"PPC p-value (kurtosis): {p_ppc:.3f}")
```

A p-value near 0 or 1 indicates the observed statistic is extreme relative
to what the model predicts.

---

## Tips

- **Seed everything.** Pass an integer `seed` to both `simulate()` and
  `bootMer()` for reproducible results.
- **Progress bar.** `bootMer(..., show_progress=True)` uses `tqdm` if
  installed. Install it with `pip install tqdm`.
- **Speed.** Each `bootMer` replicate refits the model from scratch. For
  large datasets or many replicates, budget accordingly — 500 replicates
  on a model that takes 0.5 s to fit takes ~4 min.
- **Percentile CIs are not bias-corrected.** `BootResult.ci()` uses the
  raw percentile method. For small `B` or heavily skewed statistics, the
  BCa (bias-corrected accelerated) method is more accurate, but requires
  larger `B` to be stable.

---

## See also

- {doc}`api/fit` — the `fit()` function and its parameters
- {doc}`variance-inference` — profile CIs and Wald CIs for variance
  components
- {doc}`model-comparison` — AIC/BIC and likelihood ratio tests