Influence diagnostics

Exact-deletion influence analysis following Demidenko & Stukel (2005). All functions accept both CrossedLMEResult and statsmodels.MixedLMResults.

hlm_influence

hlm_influence(model, level=1, vc_formula=None, optimizer='lbfgsb', n_jobs=1, show_progress=False)[source]

Calculate multiple influence diagnostics via exact deletion.

Parameters:
  • model – A CrossedLMEResult or statsmodels MixedLMResults object.

  • level1 for observation-level; a group column name for group-level deletion.

  • vc_formula – Variance-components formula passed through to statsmodels refits (3-level models only; ignored for CrossedLMEResult).

  • optimizer – Optimizer used for each case-deletion refit. "lbfgsb" (default) uses L-BFGS-B via scipy. "bobyqa" uses pybobyqa and routes single-RE statsmodels refits through interlace REML, which is more robust near variance-parameter boundaries and reduces the Cook’s D gap relative to R/HLMdiag. Requires the bobyqa optional extra when set to "bobyqa".

  • n_jobs – Number of parallel worker processes for case-deletion refits. 1 (default) runs sequentially. -1 uses all available CPUs (os.cpu_count()). Values > 1 are used as-is. Parallelism is only applied on the CrossedLMEResult path; statsmodels refits always run sequentially. On Linux, workers are forked (fast startup); on macOS/Windows, they are spawned (slower startup — parallelism helps mainly when n ≳ 500).

  • show_progress – Show a tqdm progress bar. Default: False.

Return type:

Any

Returns:

Native DataFrame (pandas, polars, …) in the same type as the model input. – Columns: cooksd, mdffits, covtrace, covratio, rvc.<name> for each variance component.

Parameters:
  • model (Any)

  • level (int | str)

  • vc_formula (Any)

  • optimizer (str)

  • n_jobs (int)

  • show_progress (bool)

Examples

>>> infl = interlace.hlm_influence(result)
>>> infl["cooksd"]

Returned columns

Column

Description

cooksd

Cook’s distance

mdffits

MDFFITS (Measures of Difference in Fixed Effects)

covtrace

COVTRACE (trace of V⁻¹Vᵢ) − p

covratio

COVRATIO (det(Vᵢ) / det(V))

rvc.<name>

Relative variance change per variance component

Convenience wrappers

cooks_distance(model, optimizer='lbfgsb')[source]

Return Cook’s distance for each observation.

Parameters:
  • model – A CrossedLMEResult or statsmodels MixedLMResults object.

  • optimizer – Optimizer used for case-deletion refits. See hlm_influence().

Return type:

ndarray

Returns:

np.ndarray of shape (n,) – Cook’s distance for each of the n observations.

Parameters:
  • model (Any)

  • optimizer (str)

Examples

>>> cd = interlace.cooks_distance(result)
>>> cd.max()
mdffits(model, optimizer='lbfgsb')[source]

Return MDFFITS for each observation.

Parameters:
  • model – A CrossedLMEResult or statsmodels MixedLMResults object.

  • optimizer – Optimizer used for case-deletion refits. See hlm_influence().

Return type:

ndarray

Returns:

np.ndarray of shape (n,) – MDFFITS value for each of the n observations.

Parameters:
  • model (Any)

  • optimizer (str)

Examples

>>> mdf = interlace.mdffits(result)
>>> mdf.max()
n_influential(model, threshold=None, optimizer='lbfgsb')[source]

Count observations whose Cook’s distance exceeds threshold.

Parameters:
  • model – A CrossedLMEResult or statsmodels MixedLMResults object.

  • threshold – Cut-off value. Defaults to 4 / n where n is the number of observations (the standard heuristic).

  • optimizer – Optimizer used for case-deletion refits. See hlm_influence().

Return type:

int

Returns:

int – Number of observations exceeding the threshold.

Parameters:
  • model (Any)

  • threshold (float | None)

  • optimizer (str)

Examples

>>> interlace.n_influential(result)
>>> interlace.n_influential(result, threshold=0.1)

Note

tau_gap() was removed from interlace in v0.2.9. It is a GPG-domain metric and has moved to gpgap.diagnostics.tau_gap().

Example

import interlace

result = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"])

# Full influence frame
infl = interlace.hlm_influence(result)
print(infl.columns.tolist())
# ['cooksd', 'mdffits', 'covtrace', 'covratio', 'rvc.subject', 'rvc.item']

# Flag influential observations (Cook's D threshold: 4/n)
n = result.nobs
flagged = infl[infl["cooksd"] > 4 / n]
print(f"{len(flagged)} influential observations by Cook's D")

# Convenience: count influential at a given threshold
print(interlace.n_influential(result, threshold=4 / n))

# Better parity with R/HLMdiag (requires bobyqa extra)
infl_bobyqa = interlace.hlm_influence(result, optimizer="bobyqa")

See also

  • Leverage — hat-matrix diagnostics (influence on fit, not estimates)

  • Augment — append Cook’s D + leverage to the original DataFrame in one call

  • Residuals — residual diagnostics