# Model Comparison Comparing mixed models lets you test whether adding predictors or expanding the random-effect structure improves fit beyond what you'd expect by chance. This page covers the likelihood ratio test (LRT) workflow and when to use ML vs REML. --- ## REML vs ML: which to use | Goal | Estimator | Why | |------|-----------|-----| | **Final parameter estimates** (fixed effects, variance components) | `method="REML"` *(default)* | Unbiased variance estimates | | **Comparing models with different fixed effects** | `method="ML"` | REML likelihoods are not comparable when the fixed-effect design differs | | **Comparing models with the same fixed effects, different random structure** | Either (REML preferred) | REML likelihoods *are* comparable here; use REML for unbiased variance estimates | **Practical workflow:** 1. Use `method="ML"` to select fixed effects (test whether adding a predictor helps) 2. Refit the winning model with `method="REML"` (default) to get the final estimates --- ## Likelihood ratio test (LRT) The LRT compares two nested models. The test statistic is: ``` χ² = 2 × (log-likelihood_full − log-likelihood_reduced) ``` Under the null hypothesis (reduced model is adequate), this follows a χ² distribution with degrees of freedom equal to the number of added parameters. ### Comparing fixed-effect structures ```python import interlace import scipy.stats # Both models must use method="ML" m_reduced = interlace.fit( "rt ~ 1", # intercept only data=df, groups=["subject", "item"], method="ML", ) m_full = interlace.fit( "rt ~ condition", # + condition effect data=df, groups=["subject", "item"], method="ML", ) lrt_stat = 2 * (m_full.llf - m_reduced.llf) df_diff = 1 # one added parameter (condition coefficient) p_value = scipy.stats.chi2.sf(lrt_stat, df=df_diff) print(f"LRT χ²({df_diff}) = {lrt_stat:.3f}, p = {p_value:.4f}") # LRT χ²(1) = 18.42, p = 0.0000 ``` ### Comparing random-effect structures When adding a random slope, the number of added parameters depends on the parameterisation: | Change | Added parameters | |--------|-----------------| | Add intercept-only term | 1 (variance) | | Add correlated slope `(1 + x \| g)` | 2 (slope variance + intercept-slope covariance) | | Add independent slope `(1 + x \|\| g)` | 1 (slope variance only, covariance = 0) | ```python m_intercept = interlace.fit( "rt ~ condition", data=df, groups=["subject", "item"], method="ML", ) m_slopes = interlace.fit( "rt ~ condition", data=df, random=["(1 + condition | subject)", "(1 | item)"], method="ML", ) lrt_stat = 2 * (m_slopes.llf - m_intercept.llf) p_value = scipy.stats.chi2.sf(lrt_stat, df=2) # 2 extra params print(f"LRT χ²(2) = {lrt_stat:.3f}, p = {p_value:.4f}") ``` --- ## AIC and BIC For non-nested comparisons (e.g. two different fixed-effect structures where neither is a special case of the other), use information criteria: ```python print(f"AIC: {m_reduced.aic:.1f} vs {m_full.aic:.1f}") print(f"BIC: {m_reduced.bic:.1f} vs {m_full.bic:.1f}") ``` Lower AIC/BIC is better. BIC penalises complexity more heavily and tends to favour simpler models. Differences < 2 are not meaningful; differences > 10 are strong. --- ## Step-by-step workflow A practical model-building workflow for a typical analysis: ```python import interlace import scipy.stats # Step 1: Fit baseline with ML m0 = interlace.fit("rt ~ 1", data=df, groups=["subject", "item"], method="ML") # Step 2: Add predictors one at a time, test with LRT m1 = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"], method="ML") lrt = 2 * (m1.llf - m0.llf) print(f"Adding condition: χ²(1) = {lrt:.2f}, p = {scipy.stats.chi2.sf(lrt, 1):.4f}") m2 = interlace.fit("rt ~ condition + frequency", data=df, groups=["subject", "item"], method="ML") lrt = 2 * (m2.llf - m1.llf) print(f"Adding frequency: χ²(1) = {lrt:.2f}, p = {scipy.stats.chi2.sf(lrt, 1):.4f}") # Step 3: Refit winning model with REML for final estimates m_final = interlace.fit("rt ~ condition + frequency", data=df, groups=["subject", "item"]) # method="REML" is the default print(m_final.fe_params) print(m_final.variance_components) ``` --- ## `anova()` — likelihood-ratio test shortcut The manual LRT (compute `2 * (m_full.llf - m_reduced.llf)`, look up the χ² p-value) works but is repetitive. `interlace.anova()` automates it, producing a two-row table that matches lme4's `anova.merMod()` output: ```python import interlace m0 = interlace.fit("rt ~ 1", data=df, groups=["subject", "item"], method="ML") m1 = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"], method="ML") print(interlace.anova(m0, m1)) # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) # 0 4 ... ... ... ... NaN NaN NaN # 1 5 ... ... ... ... 18.4 1.0 0.0000 ``` The simpler model is always shown first regardless of argument order. `Chisq`, `Chi Df`, and `Pr(>Chisq)` are `NaN` for the reduced model row. `anova()` raises `ValueError` if either model was fitted with REML — a deliberate guard against comparing incomparable likelihoods. ### With `update()` `anova()` and `update()` compose naturally for incremental model building: ```python m0 = interlace.fit("rt ~ 1", data=df, groups=["subject", "item"], method="ML") m1 = m0.update(". ~ . + condition") m2 = m1.update(". ~ . + frequency") print(interlace.anova(m0, m1)) print(interlace.anova(m1, m2)) # Refit winner with REML for final estimates m_final = m2.update(method="REML") ``` --- ## Iterative refinement with `update()` Repeatedly calling `interlace.fit()` with slightly different formulas is verbose. The `update()` method reruns the fit with only the parts you want to change — formula, data, or any keyword argument — while inheriting the rest from the original call. ### Dot notation A `.` in the new formula expands to the corresponding part of the original: ```python # Original model m0 = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"], method="ML") # Add a predictor: . ~ . + frequency m1 = m0.update(". ~ . + frequency") # Remove a predictor: . ~ . - condition m_reduced = m0.update(". ~ . - condition") # Replace the response (LHS) m_alt = m0.update("log_rt ~ .") ``` The original `m0` is unchanged; `update()` always returns a new `CrossedLMEResult`. ### Changing the dataset Pass `data=` to refit on a different frame — useful for sensitivity analyses or rolling-window designs: ```python # Refit on a filtered subset m_large = m0.update(data=df[df["school_size"] > 200]) # Change data and formula together m_sens = m0.update(". ~ . + frequency", data=df_filtered) ``` ### Overriding fit arguments Any keyword accepted by `interlace.fit()` can be overridden: ```python # Switch from ML to REML for final estimates m_reml = m1.update(method="REML") # Switch optimizer m_bobyqa = m0.update(optimizer="bobyqa") ``` ### Typical workflow ```python import interlace, scipy.stats # 1. Build models incrementally with ML m0 = interlace.fit("rt ~ 1", data=df, groups=["subject", "item"], method="ML") m1 = m0.update(". ~ . + condition") m2 = m1.update(". ~ . + frequency") # 2. Test each step with LRT for reduced, full, name in [(m0, m1, "condition"), (m1, m2, "frequency")]: stat = 2 * (full.llf - reduced.llf) p = scipy.stats.chi2.sf(stat, df=1) print(f"{name}: χ²(1) = {stat:.2f}, p = {p:.4f}") # 3. Refit winner with REML for final estimates m_final = m2.update(method="REML") print(m_final.summary()) ``` --- ## Notes and caveats - **LRT p-values for variance components are conservative.** The null hypothesis puts the parameter on the boundary of the parameter space (variance ≥ 0), so the chi-squared approximation is anti-conservative. A rule of thumb: halve the p-value, or use a mixture distribution. For fixed effects, the approximation is accurate. - **Do not compare REML likelihoods across different fixed-effect structures.** REML integrates out the fixed effects, so the likelihood depends on the fixed-effect design matrix — two models with different fixed effects have incomparable REML likelihoods. - **AIC/BIC with REML**: `result.aic` and `result.bic` are computed from the REML log-likelihood when `method="REML"`. Use them only for models with identical fixed effects. --- ## See also - [Concepts](concepts.md) — REML vs ML background - [Random Slopes Guide](random-slopes.md) — testing slope variance with LRT - {doc}`api/fit` — `method` and `optimizer` parameters