# lmer_influence_measures Combined influence measures matching R's `HLMdiag::hlm_influence`. Returns Cook's D, leverage, DFBETAS, and MDFFITS for every observation. ```{eval-rst} .. autofunction:: interlace.lmer_influence_measures ``` ## Returned keys | Key | Description | |---|---| | `cooks` | Cook's distance per observation (case-deletion or GLS-LOO, see below) | | `hat` | Leverage used for threshold flagging (`hat_overall` for single-RE, `hat_fixef` for crossed multi-RE) | | `hat_overall` | Full leverage H₁ + H₂ | | `hat_fixef` | Fixed-effects-only leverage H₁ | | `dfbetas` | DFBETAS matrix, shape `(n, p)` — one column per fixed-effect coefficient | | `dffits` | MDFFITS per observation | | `residuals` | Conditional residuals | | `sigma` | Residual standard deviation √(scale) | ## `analytical=True`: GLS-LOO Woodbury fast path By default (`analytical=False`) Cook's D and MDFFITS are computed by O(n) REML refits (case-deletion), which is exact but slow for large datasets. With `analytical=True`, Cook's D is computed via the **GLS-LOO Woodbury identity**, fixing variance components at the full-model estimates: ``` β̂₍₋ᵢ₎ = β̂ − V_β · tᵢ · εᵢ / (pᵢᵢ · (1 − h̃ᵢ)) D_i = εᵢ² · h̃ᵢ / (p · pᵢᵢ · (1 − h̃ᵢ)²) ``` This is **thousands of times faster** than O(n) refits and matches `n_influential` counts exactly on large datasets (n ≥ 200): | Dataset | `analytical=False` | `analytical=True` | Speedup | |---|---|---|---| | crossed, n=2000 | 12.65 s | 0.003 s | ~3700× | | large scale, n=1000 | 7.48 s | 0.000 s | ~15000× | **Requirements**: the model must be a `CrossedLMEResult` (not statsmodels path), and must have been fitted with the default L-BFGS-B optimizer so that `_A11` and `_W` are populated on the result object. ## Examples ### Case-deletion (exact, default) ```python import interlace result = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"]) measures = interlace.lmer_influence_measures(result) # Flag observations with Cook's D > 4/n n = result.nobs flagged = measures["cooks"] > 4 / n print(f"{flagged.sum()} influential observations") ``` ### GLS-LOO Woodbury (fast, recommended for n > 200) ```python measures = interlace.lmer_influence_measures(result, analytical=True) ``` ### Parallel case-deletion refits ```python # -1 = use all available CPUs measures = interlace.lmer_influence_measures(result, n_jobs=-1, show_progress=True) ``` ### Accessing DFBETAS ```python import pandas as pd dfbetas = pd.DataFrame( measures["dfbetas"], columns=result.fe_params.index, ) print(dfbetas.abs().max()) # largest DFBETAS per coefficient ``` ## See also - {doc}`influence` — `hlm_influence()` for the lower-level case-deletion frame - {doc}`leverage` — hat-matrix diagnostics - {doc}`augment` — append Cook's D and leverage to the original DataFrame