lmer_influence_measures

Combined influence measures matching R’s HLMdiag::hlm_influence. Returns Cook’s D, leverage, DFBETAS, and MDFFITS for every observation.

lmer_influence_measures(model, optimizer='lbfgsb', show_progress=False, n_jobs=1, analytical=False)[source]

Compute all influence measures for an lmer model.

Matches R’s HLMdiag convention. This combines case-deletion Cook’s D / mdffits (via hlm_influence()) with analytical leverage and DFBETAS — exactly mirroring R’s HLMdiag::hlm_influence.

Parameters:
  • model – A CrossedLMEResult or statsmodels MixedLMResults object.

  • optimizer – Optimizer for case-deletion refits. See hlm_influence().

  • show_progress – Show a tqdm progress bar during case-deletion refits. Useful for large datasets (n > 500).

  • n_jobs – Number of parallel worker processes. 1 = sequential (default), -1 = all CPUs. See hlm_influence() for details.

  • analytical – If True, use the GLS-LOO Woodbury formula instead of O(n) REML refits (see _gls_loo_influence()). This is thousands of times faster and matches n_influential counts exactly on large datasets (n ≥ 200), but fixes variance components at the full-model estimates. Requires a CrossedLMEResult with _A11 and _W populated. Only available for CrossedLMEResult (not statsmodels path).

Return type:

dict[str, ndarray]

Returns:

dict with keyscooks — Cook’s D (case-deletion or GLS-LOO depending on analytical) hat — leverage used for threshold flagging (overall for

single-RE, fixef for crossed multi-RE — mirrors R)

hat_overall — full leverage H1+H2 hat_fixef — fixed-effects leverage H1 only dfbetas — DFBETAS matrix (analytical, same formula as R) dffits — mdffits (case-deletion) or GLS-LOO MDFFITS residuals — conditional residuals sigma — residual standard deviation sqrt(scale)

Parameters:
  • model (Any)

  • optimizer (str)

  • show_progress (bool)

  • n_jobs (int)

  • analytical (bool)

Notes

Cook’s D uses the Demidenko & Stukel (2005) case-deletion formula:

D_i = (1/p) (β̂ − β̂₍₋ᵢ₎)ᵀ V_β⁻¹ (β̂ − β̂₍₋ᵢ₎)

mdffits (returned as dffits) uses the case-deletion covariance:

MDFFITS_i = (1/p) (β̂ − β̂₍₋ᵢ₎)ᵀ V_β₍₋ᵢ₎⁻¹ (β̂ − β̂₍₋ᵢ₎)

Both require O(n) model refits unless analytical=True. For large datasets consider analytical=True (exact match on n_influential) or show_progress=True to monitor refit progress.

DFBETAS is computed analytically using the fixed-effects design matrix and conditional residuals, matching R’s implementation.

Leverage flagging uses hat_overall (H1+H2) for single-RE models and hat_fixef (H1 only) for crossed multi-RE models, exactly as R does when HLMdiag cannot compute overall leverage for crossed random effects.

Examples

>>> measures = interlace.lmer_influence_measures(result)
>>> measures["cooks"]

Returned keys

Key

Description

cooks

Cook’s distance per observation (case-deletion or GLS-LOO, see below)

hat

Leverage used for threshold flagging (hat_overall for single-RE, hat_fixef for crossed multi-RE)

hat_overall

Full leverage H₁ + H₂

hat_fixef

Fixed-effects-only leverage H₁

dfbetas

DFBETAS matrix, shape (n, p) — one column per fixed-effect coefficient

dffits

MDFFITS per observation

residuals

Conditional residuals

sigma

Residual standard deviation √(scale)

analytical=True: GLS-LOO Woodbury fast path

By default (analytical=False) Cook’s D and MDFFITS are computed by O(n) REML refits (case-deletion), which is exact but slow for large datasets.

With analytical=True, Cook’s D is computed via the GLS-LOO Woodbury identity, fixing variance components at the full-model estimates:

β̂₍₋ᵢ₎ = β̂ − V_β · tᵢ · εᵢ / (pᵢᵢ · (1 − h̃ᵢ))
D_i    = εᵢ² · h̃ᵢ / (p · pᵢᵢ · (1 − h̃ᵢ)²)

This is thousands of times faster than O(n) refits and matches n_influential counts exactly on large datasets (n ≥ 200):

Dataset

analytical=False

analytical=True

Speedup

crossed, n=2000

12.65 s

0.003 s

~3700×

large scale, n=1000

7.48 s

0.000 s

~15000×

Requirements: the model must be a CrossedLMEResult (not statsmodels path), and must have been fitted with the default L-BFGS-B optimizer so that _A11 and _W are populated on the result object.

Examples

Case-deletion (exact, default)

import interlace

result = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"])
measures = interlace.lmer_influence_measures(result)

# Flag observations with Cook's D > 4/n
n = result.nobs
flagged = measures["cooks"] > 4 / n
print(f"{flagged.sum()} influential observations")

Parallel case-deletion refits

# -1 = use all available CPUs
measures = interlace.lmer_influence_measures(result, n_jobs=-1, show_progress=True)

Accessing DFBETAS

import pandas as pd

dfbetas = pd.DataFrame(
    measures["dfbetas"],
    columns=result.fe_params.index,
)
print(dfbetas.abs().max())   # largest DFBETAS per coefficient

See also

  • Influence diagnosticshlm_influence() for the lower-level case-deletion frame

  • Leverage — hat-matrix diagnostics

  • Augment — append Cook’s D and leverage to the original DataFrame