lmer_influence_measures¶

Combined influence measures matching R’s HLMdiag::hlm_influence. Returns Cook’s D, leverage, DFBETAS, and MDFFITS for every observation.

lmer_influence_measures(model, optimizer='lbfgsb', show_progress=False, n_jobs=1, analytical=False)[source]¶

Compute all influence measures for an lmer model.

Matches R’s HLMdiag convention. This combines case-deletion Cook’s D / mdffits (via hlm_influence()) with analytical leverage and DFBETAS — exactly mirroring R’s HLMdiag::hlm_influence.

Parameters:

model – A CrossedLMEResult or statsmodels MixedLMResults object.
optimizer – Optimizer for case-deletion refits. See hlm_influence().
show_progress – Show a tqdm progress bar during case-deletion refits. Useful for large datasets (n > 500).
n_jobs – Number of parallel worker processes. 1 = sequential (default), -1 = all CPUs. See hlm_influence() for details.
analytical – If True, use the GLS-LOO Woodbury formula instead of O(n) REML refits (see _gls_loo_influence()). This is thousands of times faster and matches n_influential counts exactly on large datasets (n ≥ 200), but fixes variance components at the full-model estimates. Requires a CrossedLMEResult with _A11 and _W populated. Only available for CrossedLMEResult (not statsmodels path).

Return type:

dict[str, ndarray]

Returns:

dict with keys – cooks — Cook’s D (case-deletion or GLS-LOO depending on analytical) hat — leverage used for threshold flagging (overall for

single-RE, fixef for crossed multi-RE — mirrors R)

hat_overall — full leverage H1+H2 hat_fixef — fixed-effects leverage H1 only dfbetas — DFBETAS matrix (analytical, same formula as R) dffits — mdffits (case-deletion) or GLS-LOO MDFFITS residuals — conditional residuals sigma — residual standard deviation sqrt(scale)

Parameters:

model (Any)
optimizer (str)
show_progress (bool)
n_jobs (int)
analytical (bool)

Notes

Cook’s D uses the Demidenko & Stukel (2005) case-deletion formula:

D_i = (1/p) (β̂ − β̂₍₋ᵢ₎)ᵀ V_β⁻¹ (β̂ − β̂₍₋ᵢ₎)

mdffits (returned as dffits) uses the case-deletion covariance:

MDFFITS_i = (1/p) (β̂ − β̂₍₋ᵢ₎)ᵀ V_β₍₋ᵢ₎⁻¹ (β̂ − β̂₍₋ᵢ₎)

Both require O(n) model refits unless analytical=True. For large datasets consider analytical=True (exact match on n_influential) or show_progress=True to monitor refit progress.

DFBETAS is computed analytically using the fixed-effects design matrix and conditional residuals, matching R’s implementation.

Leverage flagging uses hat_overall (H1+H2) for single-RE models and hat_fixef (H1 only) for crossed multi-RE models, exactly as R does when HLMdiag cannot compute overall leverage for crossed random effects.

Examples

>>> measures = interlace.lmer_influence_measures(result)
>>> measures["cooks"]

Returned keys¶

Key	Description
`cooks`	Cook’s distance per observation (case-deletion or GLS-LOO, see below)
`hat`	Leverage used for threshold flagging (`hat_overall` for single-RE, `hat_fixef` for crossed multi-RE)
`hat_overall`	Full leverage H₁ + H₂
`hat_fixef`	Fixed-effects-only leverage H₁
`dfbetas`	DFBETAS matrix, shape `(n, p)` — one column per fixed-effect coefficient
`dffits`	MDFFITS per observation
`residuals`	Conditional residuals
`sigma`	Residual standard deviation √(scale)

`analytical=True`: GLS-LOO Woodbury fast path¶

By default (analytical=False) Cook’s D and MDFFITS are computed by O(n) REML refits (case-deletion), which is exact but slow for large datasets.

With analytical=True, Cook’s D is computed via the GLS-LOO Woodbury identity, fixing variance components at the full-model estimates:

β̂₍₋ᵢ₎ = β̂ − V_β · tᵢ · εᵢ / (pᵢᵢ · (1 − h̃ᵢ))
D_i    = εᵢ² · h̃ᵢ / (p · pᵢᵢ · (1 − h̃ᵢ)²)

This is thousands of times faster than O(n) refits and matches n_influential counts exactly on large datasets (n ≥ 200):

Dataset	`analytical=False`	`analytical=True`	Speedup
crossed, n=2000	12.65 s	0.003 s	~3700×
large scale, n=1000	7.48 s	0.000 s	~15000×

Requirements: the model must be a CrossedLMEResult (not statsmodels path), and must have been fitted with the default L-BFGS-B optimizer so that _A11 and _W are populated on the result object.

Examples¶

Case-deletion (exact, default)¶

import interlace

result = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"])
measures = interlace.lmer_influence_measures(result)

# Flag observations with Cook's D > 4/n
n = result.nobs
flagged = measures["cooks"] > 4 / n
print(f"{flagged.sum()} influential observations")

GLS-LOO Woodbury (fast, recommended for n > 200)¶

measures = interlace.lmer_influence_measures(result, analytical=True)

Parallel case-deletion refits¶

# -1 = use all available CPUs
measures = interlace.lmer_influence_measures(result, n_jobs=-1, show_progress=True)

Accessing DFBETAS¶

import pandas as pd

dfbetas = pd.DataFrame(
    measures["dfbetas"],
    columns=result.fe_params.index,
)
print(dfbetas.abs().max())   # largest DFBETAS per coefficient