lmer_influence_measures¶
Combined influence measures matching R’s HLMdiag::hlm_influence.
Returns Cook’s D, leverage, DFBETAS, and MDFFITS for every observation.
- lmer_influence_measures(model, optimizer='lbfgsb', show_progress=False, n_jobs=1, analytical=False)[source]¶
Compute all influence measures for an lmer model.
Matches R’s HLMdiag convention. This combines case-deletion Cook’s D / mdffits (via
hlm_influence()) with analytical leverage and DFBETAS — exactly mirroring R’sHLMdiag::hlm_influence.- Parameters:
model – A
CrossedLMEResultor statsmodelsMixedLMResultsobject.optimizer – Optimizer for case-deletion refits. See
hlm_influence().show_progress – Show a tqdm progress bar during case-deletion refits. Useful for large datasets (n > 500).
n_jobs – Number of parallel worker processes.
1= sequential (default),-1= all CPUs. Seehlm_influence()for details.analytical – If
True, use the GLS-LOO Woodbury formula instead of O(n) REML refits (see_gls_loo_influence()). This is thousands of times faster and matchesn_influentialcounts exactly on large datasets (n ≥ 200), but fixes variance components at the full-model estimates. Requires aCrossedLMEResultwith_A11and_Wpopulated. Only available forCrossedLMEResult(not statsmodels path).
- Return type:
dict[str,ndarray]- Returns:
dict with keys –
cooks— Cook’s D (case-deletion or GLS-LOO depending onanalytical)hat— leverage used for threshold flagging (overallforsingle-RE,
fixeffor crossed multi-RE — mirrors R)hat_overall— full leverage H1+H2hat_fixef— fixed-effects leverage H1 onlydfbetas— DFBETAS matrix (analytical, same formula as R)dffits— mdffits (case-deletion) or GLS-LOO MDFFITSresiduals— conditional residualssigma— residual standard deviation sqrt(scale)- Parameters:
model (Any)
optimizer (str)
show_progress (bool)
n_jobs (int)
analytical (bool)
Notes
Cook’s D uses the Demidenko & Stukel (2005) case-deletion formula:
D_i = (1/p) (β̂ − β̂₍₋ᵢ₎)ᵀ V_β⁻¹ (β̂ − β̂₍₋ᵢ₎)
mdffits (returned as
dffits) uses the case-deletion covariance:MDFFITS_i = (1/p) (β̂ − β̂₍₋ᵢ₎)ᵀ V_β₍₋ᵢ₎⁻¹ (β̂ − β̂₍₋ᵢ₎)
Both require O(n) model refits unless
analytical=True. For large datasets consideranalytical=True(exact match on n_influential) orshow_progress=Trueto monitor refit progress.DFBETAS is computed analytically using the fixed-effects design matrix and conditional residuals, matching R’s implementation.
Leverage flagging uses
hat_overall(H1+H2) for single-RE models andhat_fixef(H1 only) for crossed multi-RE models, exactly as R does when HLMdiag cannot compute overall leverage for crossed random effects.Examples
>>> measures = interlace.lmer_influence_measures(result) >>> measures["cooks"]
Returned keys¶
Key |
Description |
|---|---|
|
Cook’s distance per observation (case-deletion or GLS-LOO, see below) |
|
Leverage used for threshold flagging ( |
|
Full leverage H₁ + H₂ |
|
Fixed-effects-only leverage H₁ |
|
DFBETAS matrix, shape |
|
MDFFITS per observation |
|
Conditional residuals |
|
Residual standard deviation √(scale) |
analytical=True: GLS-LOO Woodbury fast path¶
By default (analytical=False) Cook’s D and MDFFITS are computed by O(n) REML refits
(case-deletion), which is exact but slow for large datasets.
With analytical=True, Cook’s D is computed via the GLS-LOO Woodbury identity,
fixing variance components at the full-model estimates:
β̂₍₋ᵢ₎ = β̂ − V_β · tᵢ · εᵢ / (pᵢᵢ · (1 − h̃ᵢ))
D_i = εᵢ² · h̃ᵢ / (p · pᵢᵢ · (1 − h̃ᵢ)²)
This is thousands of times faster than O(n) refits and matches n_influential
counts exactly on large datasets (n ≥ 200):
Dataset |
|
|
Speedup |
|---|---|---|---|
crossed, n=2000 |
12.65 s |
0.003 s |
~3700× |
large scale, n=1000 |
7.48 s |
0.000 s |
~15000× |
Requirements: the model must be a CrossedLMEResult (not statsmodels path), and must
have been fitted with the default L-BFGS-B optimizer so that _A11 and _W are populated
on the result object.
Examples¶
Case-deletion (exact, default)¶
import interlace
result = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"])
measures = interlace.lmer_influence_measures(result)
# Flag observations with Cook's D > 4/n
n = result.nobs
flagged = measures["cooks"] > 4 / n
print(f"{flagged.sum()} influential observations")
GLS-LOO Woodbury (fast, recommended for n > 200)¶
measures = interlace.lmer_influence_measures(result, analytical=True)
Parallel case-deletion refits¶
# -1 = use all available CPUs
measures = interlace.lmer_influence_measures(result, n_jobs=-1, show_progress=True)
Accessing DFBETAS¶
import pandas as pd
dfbetas = pd.DataFrame(
measures["dfbetas"],
columns=result.fe_params.index,
)
print(dfbetas.abs().max()) # largest DFBETAS per coefficient
See also¶
Influence diagnostics —
hlm_influence()for the lower-level case-deletion frameLeverage — hat-matrix diagnostics
Augment — append Cook’s D and leverage to the original DataFrame