Influence diagnostics¶
Exact-deletion influence analysis following Demidenko & Stukel (2005).
All functions accept both CrossedLMEResult and statsmodels.MixedLMResults.
hlm_influence¶
- hlm_influence(model, level=1, vc_formula=None, optimizer='lbfgsb', n_jobs=1, show_progress=False)[source]¶
Calculate multiple influence diagnostics via exact deletion.
- Parameters:
model – A
CrossedLMEResultor statsmodelsMixedLMResultsobject.level –
1for observation-level; a group column name for group-level deletion.vc_formula – Variance-components formula passed through to statsmodels refits (3-level models only; ignored for
CrossedLMEResult).optimizer – Optimizer used for each case-deletion refit.
"lbfgsb"(default) uses L-BFGS-B via scipy."bobyqa"usespybobyqaand routes single-RE statsmodels refits through interlace REML, which is more robust near variance-parameter boundaries and reduces the Cook’s D gap relative to R/HLMdiag. Requires thebobyqaoptional extra when set to"bobyqa".n_jobs – Number of parallel worker processes for case-deletion refits.
1(default) runs sequentially.-1uses all available CPUs (os.cpu_count()). Values > 1 are used as-is. Parallelism is only applied on theCrossedLMEResultpath; statsmodels refits always run sequentially. On Linux, workers are forked (fast startup); on macOS/Windows, they are spawned (slower startup — parallelism helps mainly when n ≳ 500).show_progress – Show a tqdm progress bar. Default:
False.
- Return type:
Any- Returns:
Native DataFrame (pandas, polars, …) in the same type as the model input. – Columns:
cooksd,mdffits,covtrace,covratio,rvc.<name>for each variance component.- Parameters:
model (Any)
level (int | str)
vc_formula (Any)
optimizer (str)
n_jobs (int)
show_progress (bool)
Examples
>>> infl = interlace.hlm_influence(result) >>> infl["cooksd"]
Returned columns¶
Column |
Description |
|---|---|
|
Cook’s distance |
|
MDFFITS (Measures of Difference in Fixed Effects) |
|
COVTRACE (trace of V⁻¹Vᵢ) − p |
|
COVRATIO (det(Vᵢ) / det(V)) |
|
Relative variance change per variance component |
Convenience wrappers¶
- cooks_distance(model, optimizer='lbfgsb')[source]¶
Return Cook’s distance for each observation.
- Parameters:
model – A
CrossedLMEResultor statsmodelsMixedLMResultsobject.optimizer – Optimizer used for case-deletion refits. See
hlm_influence().
- Return type:
ndarray- Returns:
np.ndarray of shape (n,) – Cook’s distance for each of the n observations.
- Parameters:
model (Any)
optimizer (str)
Examples
>>> cd = interlace.cooks_distance(result) >>> cd.max()
- mdffits(model, optimizer='lbfgsb')[source]¶
Return MDFFITS for each observation.
- Parameters:
model – A
CrossedLMEResultor statsmodelsMixedLMResultsobject.optimizer – Optimizer used for case-deletion refits. See
hlm_influence().
- Return type:
ndarray- Returns:
np.ndarray of shape (n,) – MDFFITS value for each of the n observations.
- Parameters:
model (Any)
optimizer (str)
Examples
>>> mdf = interlace.mdffits(result) >>> mdf.max()
- n_influential(model, threshold=None, optimizer='lbfgsb')[source]¶
Count observations whose Cook’s distance exceeds threshold.
- Parameters:
model – A
CrossedLMEResultor statsmodelsMixedLMResultsobject.threshold – Cut-off value. Defaults to
4 / nwherenis the number of observations (the standard heuristic).optimizer – Optimizer used for case-deletion refits. See
hlm_influence().
- Return type:
int- Returns:
int – Number of observations exceeding the threshold.
- Parameters:
model (Any)
threshold (float | None)
optimizer (str)
Examples
>>> interlace.n_influential(result) >>> interlace.n_influential(result, threshold=0.1)
Note
tau_gap() was removed from interlace in v0.2.9. It is a GPG-domain metric
and has moved to gpgap.diagnostics.tau_gap().
Example¶
import interlace
result = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"])
# Full influence frame
infl = interlace.hlm_influence(result)
print(infl.columns.tolist())
# ['cooksd', 'mdffits', 'covtrace', 'covratio', 'rvc.subject', 'rvc.item']
# Flag influential observations (Cook's D threshold: 4/n)
n = result.nobs
flagged = infl[infl["cooksd"] > 4 / n]
print(f"{len(flagged)} influential observations by Cook's D")
# Convenience: count influential at a given threshold
print(interlace.n_influential(result, threshold=4 / n))
# Better parity with R/HLMdiag (requires bobyqa extra)
infl_bobyqa = interlace.hlm_influence(result, optimizer="bobyqa")