Cross-Validation Guide¶
Cross-validation estimates how well a fitted model generalises to unseen data.
For mixed models, the key challenge is data leakage: observations within the
same group share a random effect, so splitting at the observation level lets the
test set “peek” at training-set information. interlace.cross_val() avoids this
by splitting at the group level.
Why group-level CV?¶
Suppose you split randomly and a school appears in both training and test sets. The model’s BLUP for that school is informed by its training observations, giving an optimistic error estimate on the test observations from the same school. Group-level CV holds out entire groups, giving a realistic picture of prediction error for new, unseen groups.
Leave-one-group-out (LOGO)¶
The default strategy, cv="logo", removes one group at a time, fits the model on
the remaining data, and predicts on the held-out group. Repeat for every group.
from interlace import cross_val
cv = cross_val(
"score ~ hours_studied + prior_gpa",
data=df,
groups="school_id", # grouping factor used for both RE and CV folds
cv="logo", # default — can be omitted
scoring="rmse", # default
)
print(f"LOGO RMSE: {cv.mean:.3f} ± {cv.std:.3f}")
print(f"Per-fold scores: {cv.scores}")
LOGO is exact but can be slow when there are many groups, because it fits one model per unique group level.
K-fold by groups¶
cv="kfold" randomly partitions the unique group labels into k folds (default
k=5). Each fold holds out all observations from roughly n_groups / k groups.
This is faster than LOGO while still preventing leakage.
cv_k = cross_val(
"score ~ hours_studied + prior_gpa",
data=df,
groups="school_id",
cv="kfold",
k=5,
scoring="rmse",
)
print(f"5-fold CV RMSE: {cv_k.mean:.3f} ± {cv_k.std:.3f}")
The fold assignment uses a fixed seed (42) for reproducibility.
Scoring¶
Built-in metrics¶
|
Metric |
|---|---|
|
Root mean squared error |
|
Mean absolute error |
Custom scoring function¶
Pass any callable with the signature scorer(y_true, y_pred) -> float:
import numpy as np
def mape(y_true, y_pred):
"""Mean absolute percentage error."""
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
cv = cross_val(
"score ~ hours_studied",
data=df,
groups="school_id",
scoring=mape,
)
print(f"MAPE: {cv.mean:.1f}%")
Passing fit arguments¶
Extra keyword arguments are forwarded to interlace.fit(). Use this to select
a different optimizer, set the random-effect structure, or switch to ML:
cv = cross_val(
"score ~ hours_studied + prior_gpa",
data=df,
groups="school_id",
cv="logo",
random=["(1 + hours_studied | school_id)"], # random slopes
optimizer="bobyqa",
)
Inspecting per-fold models¶
Set return_models=True to store the fitted model and prediction details for
each fold. Each element of cv.fold_results is a dict with keys:
"model"— the fittedCrossedLMEResultfor that fold"train_groups"— group labels used for training"test_groups"— group labels held out"y_true"— observed response in the test set"y_pred"— model predictions on the test set
cv = cross_val(
"score ~ hours_studied + prior_gpa",
data=df,
groups="school_id",
cv="logo",
return_models=True,
)
for fold in cv.fold_results:
test_g = fold["test_groups"]
rmse = np.sqrt(np.mean((fold["y_true"] - fold["y_pred"]) ** 2))
print(f"Held-out group: {test_g}, RMSE: {rmse:.3f}")
CVResult reference¶
cross_val() returns a CVResult dataclass:
Attribute / property |
Type |
Description |
|---|---|---|
|
|
Per-fold score values (length = number of folds) |
|
|
Mean of |
|
|
Standard deviation of |
|
|
Only populated when |
Tips¶
Prefer LOGO when groups are few (< 20): each group gets exactly one held-out fold, and the variance estimate is more informative.
Prefer kfold when groups are many (≥ 50): LOGO becomes expensive; 5- or 10-fold CV is a good trade-off.
Interpret
cv.stdcarefully: with LOGO, fold scores are correlated (each fold removes one group from a common pool), sostdunderestimates true uncertainty.Use
return_models=Truesparingly: storing one fitted model per fold can consume substantial memory for large datasets.
See also¶
Model Comparison Guide — LRT and AIC for nested models
Quickstart — basic
fit()workflowcross_val — full
cross_val()parameter reference