Model Comparison¶

Comparing mixed models lets you test whether adding predictors or expanding the random-effect structure improves fit beyond what you’d expect by chance. This page covers the likelihood ratio test (LRT) workflow and when to use ML vs REML.

REML vs ML: which to use¶

Goal	Estimator	Why
Final parameter estimates (fixed effects, variance components)	`method="REML"` (default)	Unbiased variance estimates
Comparing models with different fixed effects	`method="ML"`	REML likelihoods are not comparable when the fixed-effect design differs
Comparing models with the same fixed effects, different random structure	Either (REML preferred)	REML likelihoods are comparable here; use REML for unbiased variance estimates

Practical workflow:

Use method="ML" to select fixed effects (test whether adding a predictor helps)
Refit the winning model with method="REML" (default) to get the final estimates

Likelihood ratio test (LRT)¶

The LRT compares two nested models. The test statistic is:

χ² = 2 × (log-likelihood_full − log-likelihood_reduced)

Under the null hypothesis (reduced model is adequate), this follows a χ² distribution with degrees of freedom equal to the number of added parameters.

Comparing fixed-effect structures¶

import interlace
import scipy.stats

# Both models must use method="ML"
m_reduced = interlace.fit(
    "rt ~ 1",           # intercept only
    data=df,
    groups=["subject", "item"],
    method="ML",
)

m_full = interlace.fit(
    "rt ~ condition",   # + condition effect
    data=df,
    groups=["subject", "item"],
    method="ML",
)

lrt_stat = 2 * (m_full.llf - m_reduced.llf)
df_diff  = 1  # one added parameter (condition coefficient)
p_value  = scipy.stats.chi2.sf(lrt_stat, df=df_diff)

print(f"LRT χ²({df_diff}) = {lrt_stat:.3f}, p = {p_value:.4f}")
# LRT χ²(1) = 18.42, p = 0.0000

Comparing random-effect structures¶

When adding a random slope, the number of added parameters depends on the parameterisation:

Change	Added parameters
Add intercept-only term	1 (variance)
Add correlated slope `(1 + x \| g)`	2 (slope variance + intercept-slope covariance)
Add independent slope `(1 + x \|\| g)`	1 (slope variance only, covariance = 0)

m_intercept = interlace.fit(
    "rt ~ condition",
    data=df,
    groups=["subject", "item"],
    method="ML",
)

m_slopes = interlace.fit(
    "rt ~ condition",
    data=df,
    random=["(1 + condition | subject)", "(1 | item)"],
    method="ML",
)

lrt_stat = 2 * (m_slopes.llf - m_intercept.llf)
p_value  = scipy.stats.chi2.sf(lrt_stat, df=2)  # 2 extra params
print(f"LRT χ²(2) = {lrt_stat:.3f}, p = {p_value:.4f}")

AIC and BIC¶

For non-nested comparisons (e.g. two different fixed-effect structures where neither is a special case of the other), use information criteria:

print(f"AIC: {m_reduced.aic:.1f} vs {m_full.aic:.1f}")
print(f"BIC: {m_reduced.bic:.1f} vs {m_full.bic:.1f}")

Lower AIC/BIC is better. BIC penalises complexity more heavily and tends to favour simpler models. Differences < 2 are not meaningful; differences > 10 are strong.

Step-by-step workflow¶

A practical model-building workflow for a typical analysis:

import interlace
import scipy.stats

# Step 1: Fit baseline with ML
m0 = interlace.fit("rt ~ 1", data=df, groups=["subject", "item"], method="ML")

# Step 2: Add predictors one at a time, test with LRT
m1 = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"], method="ML")
lrt = 2 * (m1.llf - m0.llf)
print(f"Adding condition: χ²(1) = {lrt:.2f}, p = {scipy.stats.chi2.sf(lrt, 1):.4f}")

m2 = interlace.fit("rt ~ condition + frequency", data=df, groups=["subject", "item"], method="ML")
lrt = 2 * (m2.llf - m1.llf)
print(f"Adding frequency: χ²(1) = {lrt:.2f}, p = {scipy.stats.chi2.sf(lrt, 1):.4f}")

# Step 3: Refit winning model with REML for final estimates
m_final = interlace.fit("rt ~ condition + frequency", data=df, groups=["subject", "item"])
# method="REML" is the default

print(m_final.fe_params)
print(m_final.variance_components)

`anova()` — likelihood-ratio test shortcut¶

The manual LRT (compute 2 * (m_full.llf - m_reduced.llf), look up the χ² p-value) works but is repetitive. interlace.anova() automates it, producing a two-row table that matches lme4’s anova.merMod() output:

import interlace

m0 = interlace.fit("rt ~ 1",         data=df, groups=["subject", "item"], method="ML")
m1 = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"], method="ML")

print(interlace.anova(m0, m1))
#    Df   AIC    BIC  logLik  deviance  Chisq  Chi Df  Pr(>Chisq)
# 0   4  ...    ...    ...      ...     NaN     NaN       NaN
# 1   5  ...    ...    ...      ...    18.4     1.0     0.0000

The simpler model is always shown first regardless of argument order. Chisq, Chi Df, and Pr(>Chisq) are NaN for the reduced model row.

anova() raises ValueError if either model was fitted with REML — a deliberate guard against comparing incomparable likelihoods.

With `update()`¶

anova() and update() compose naturally for incremental model building:

m0 = interlace.fit("rt ~ 1", data=df, groups=["subject", "item"], method="ML")
m1 = m0.update(". ~ . + condition")
m2 = m1.update(". ~ . + frequency")

print(interlace.anova(m0, m1))
print(interlace.anova(m1, m2))

# Refit winner with REML for final estimates
m_final = m2.update(method="REML")

Notes and caveats¶

LRT p-values for variance components are conservative. The null hypothesis puts the parameter on the boundary of the parameter space (variance ≥ 0), so the chi-squared approximation is anti-conservative. A rule of thumb: halve the p-value, or use a mixture distribution. For fixed effects, the approximation is accurate.
Do not compare REML likelihoods across different fixed-effect structures. REML integrates out the fixed effects, so the likelihood depends on the fixed-effect design matrix — two models with different fixed effects have incomparable REML likelihoods.
AIC/BIC with REML: result.aic and result.bic are computed from the REML log-likelihood when method="REML". Use them only for models with identical fixed effects.

Model Comparison¶

REML vs ML: which to use¶

Likelihood ratio test (LRT)¶

Comparing fixed-effect structures¶

Comparing random-effect structures¶

AIC and BIC¶

Step-by-step workflow¶

`anova()` — likelihood-ratio test shortcut¶

With `update()`¶

Iterative refinement with `update()`¶

Dot notation¶

Changing the dataset¶

Overriding fit arguments¶

Typical workflow¶

Notes and caveats¶

See also¶

Model Comparison¶

REML vs ML: which to use¶

Likelihood ratio test (LRT)¶

Comparing fixed-effect structures¶

Comparing random-effect structures¶

AIC and BIC¶

Step-by-step workflow¶

anova() — likelihood-ratio test shortcut¶

With update()¶

Iterative refinement with update()¶

Dot notation¶

Changing the dataset¶

Overriding fit arguments¶

Typical workflow¶

Notes and caveats¶

See also¶

`anova()` — likelihood-ratio test shortcut¶

With `update()`¶

Iterative refinement with `update()`¶