# Cross-Validation Guide Cross-validation estimates how well a fitted model generalises to unseen data. For mixed models, the key challenge is **data leakage**: observations within the same group share a random effect, so splitting at the observation level lets the test set "peek" at training-set information. `interlace.cross_val()` avoids this by splitting at the **group level**. --- ## Why group-level CV? Suppose you split randomly and a school appears in both training and test sets. The model's BLUP for that school is informed by its training observations, giving an optimistic error estimate on the test observations from the same school. Group-level CV holds out entire groups, giving a realistic picture of prediction error for *new, unseen groups*. --- ## Leave-one-group-out (LOGO) The default strategy, `cv="logo"`, removes one group at a time, fits the model on the remaining data, and predicts on the held-out group. Repeat for every group. ```python from interlace import cross_val cv = cross_val( "score ~ hours_studied + prior_gpa", data=df, groups="school_id", # grouping factor used for both RE and CV folds cv="logo", # default — can be omitted scoring="rmse", # default ) print(f"LOGO RMSE: {cv.mean:.3f} ± {cv.std:.3f}") print(f"Per-fold scores: {cv.scores}") ``` LOGO is exact but can be slow when there are many groups, because it fits one model per unique group level. --- ## K-fold by groups `cv="kfold"` randomly partitions the unique group labels into `k` folds (default `k=5`). Each fold holds out all observations from roughly `n_groups / k` groups. This is faster than LOGO while still preventing leakage. ```python cv_k = cross_val( "score ~ hours_studied + prior_gpa", data=df, groups="school_id", cv="kfold", k=5, scoring="rmse", ) print(f"5-fold CV RMSE: {cv_k.mean:.3f} ± {cv_k.std:.3f}") ``` The fold assignment uses a fixed seed (42) for reproducibility. --- ## Scoring ### Built-in metrics | `scoring=` | Metric | |------------|--------| | `"rmse"` *(default)* | Root mean squared error | | `"mae"` | Mean absolute error | ### Custom scoring function Pass any callable with the signature `scorer(y_true, y_pred) -> float`: ```python import numpy as np def mape(y_true, y_pred): """Mean absolute percentage error.""" return np.mean(np.abs((y_true - y_pred) / y_true)) * 100 cv = cross_val( "score ~ hours_studied", data=df, groups="school_id", scoring=mape, ) print(f"MAPE: {cv.mean:.1f}%") ``` --- ## Passing fit arguments Extra keyword arguments are forwarded to `interlace.fit()`. Use this to select a different optimizer, set the random-effect structure, or switch to ML: ```python cv = cross_val( "score ~ hours_studied + prior_gpa", data=df, groups="school_id", cv="logo", random=["(1 + hours_studied | school_id)"], # random slopes optimizer="bobyqa", ) ``` --- ## Inspecting per-fold models Set `return_models=True` to store the fitted model and prediction details for each fold. Each element of `cv.fold_results` is a dict with keys: - `"model"` — the fitted `CrossedLMEResult` for that fold - `"train_groups"` — group labels used for training - `"test_groups"` — group labels held out - `"y_true"` — observed response in the test set - `"y_pred"` — model predictions on the test set ```python cv = cross_val( "score ~ hours_studied + prior_gpa", data=df, groups="school_id", cv="logo", return_models=True, ) for fold in cv.fold_results: test_g = fold["test_groups"] rmse = np.sqrt(np.mean((fold["y_true"] - fold["y_pred"]) ** 2)) print(f"Held-out group: {test_g}, RMSE: {rmse:.3f}") ``` --- ## `CVResult` reference `cross_val()` returns a `CVResult` dataclass: | Attribute / property | Type | Description | |----------------------|------|-------------| | `scores` | `np.ndarray` | Per-fold score values (length = number of folds) | | `mean` | `float` | Mean of `scores` | | `std` | `float` | Standard deviation of `scores` (ddof=1) | | `fold_results` | `list[dict] \| None` | Only populated when `return_models=True` | --- ## Tips - **Prefer LOGO when groups are few** (< 20): each group gets exactly one held-out fold, and the variance estimate is more informative. - **Prefer kfold when groups are many** (≥ 50): LOGO becomes expensive; 5- or 10-fold CV is a good trade-off. - **Interpret `cv.std` carefully**: with LOGO, fold scores are correlated (each fold removes one group from a common pool), so `std` underestimates true uncertainty. - **Use `return_models=True` sparingly**: storing one fitted model per fold can consume substantial memory for large datasets. --- ## See also - [Model Comparison Guide](model-comparison.md) — LRT and AIC for nested models - [Quickstart](quickstart.md) — basic `fit()` workflow - {doc}`api/cross_val` — full `cross_val()` parameter reference