Cross-Validation Guide

Cross-validation estimates how well a fitted model generalises to unseen data. For mixed models, the key challenge is data leakage: observations within the same group share a random effect, so splitting at the observation level lets the test set “peek” at training-set information. interlace.cross_val() avoids this by splitting at the group level.


Why group-level CV?

Suppose you split randomly and a school appears in both training and test sets. The model’s BLUP for that school is informed by its training observations, giving an optimistic error estimate on the test observations from the same school. Group-level CV holds out entire groups, giving a realistic picture of prediction error for new, unseen groups.



K-fold by groups

cv="kfold" randomly partitions the unique group labels into k folds (default k=5). Each fold holds out all observations from roughly n_groups / k groups. This is faster than LOGO while still preventing leakage.

cv_k = cross_val(
    "score ~ hours_studied + prior_gpa",
    data=df,
    groups="school_id",
    cv="kfold",
    k=5,
    scoring="rmse",
)

print(f"5-fold CV RMSE: {cv_k.mean:.3f} ± {cv_k.std:.3f}")

The fold assignment uses a fixed seed (42) for reproducibility.


Scoring

Built-in metrics

scoring=

Metric

"rmse" (default)

Root mean squared error

"mae"

Mean absolute error

Custom scoring function

Pass any callable with the signature scorer(y_true, y_pred) -> float:

import numpy as np

def mape(y_true, y_pred):
    """Mean absolute percentage error."""
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

cv = cross_val(
    "score ~ hours_studied",
    data=df,
    groups="school_id",
    scoring=mape,
)
print(f"MAPE: {cv.mean:.1f}%")

Passing fit arguments

Extra keyword arguments are forwarded to interlace.fit(). Use this to select a different optimizer, set the random-effect structure, or switch to ML:

cv = cross_val(
    "score ~ hours_studied + prior_gpa",
    data=df,
    groups="school_id",
    cv="logo",
    random=["(1 + hours_studied | school_id)"],   # random slopes
    optimizer="bobyqa",
)

Inspecting per-fold models

Set return_models=True to store the fitted model and prediction details for each fold. Each element of cv.fold_results is a dict with keys:

  • "model" — the fitted CrossedLMEResult for that fold

  • "train_groups" — group labels used for training

  • "test_groups" — group labels held out

  • "y_true" — observed response in the test set

  • "y_pred" — model predictions on the test set

cv = cross_val(
    "score ~ hours_studied + prior_gpa",
    data=df,
    groups="school_id",
    cv="logo",
    return_models=True,
)

for fold in cv.fold_results:
    test_g  = fold["test_groups"]
    rmse    = np.sqrt(np.mean((fold["y_true"] - fold["y_pred"]) ** 2))
    print(f"Held-out group: {test_g}, RMSE: {rmse:.3f}")

CVResult reference

cross_val() returns a CVResult dataclass:

Attribute / property

Type

Description

scores

np.ndarray

Per-fold score values (length = number of folds)

mean

float

Mean of scores

std

float

Standard deviation of scores (ddof=1)

fold_results

list[dict] | None

Only populated when return_models=True


Tips

  • Prefer LOGO when groups are few (< 20): each group gets exactly one held-out fold, and the variance estimate is more informative.

  • Prefer kfold when groups are many (≥ 50): LOGO becomes expensive; 5- or 10-fold CV is a good trade-off.

  • Interpret cv.std carefully: with LOGO, fold scores are correlated (each fold removes one group from a common pool), so std underestimates true uncertainty.

  • Use return_models=True sparingly: storing one fitted model per fold can consume substantial memory for large datasets.


See also