Influence diagnostics¶

Exact-deletion influence analysis following Demidenko & Stukel (2005). All functions accept both CrossedLMEResult and statsmodels.MixedLMResults.

hlm_influence¶

hlm_influence(model, level=1, vc_formula=None, optimizer='lbfgsb', n_jobs=1, show_progress=False)[source]¶

Calculate multiple influence diagnostics via exact deletion.

Parameters:

model – A CrossedLMEResult or statsmodels MixedLMResults object.
level – 1 for observation-level; a group column name for group-level deletion.
vc_formula – Variance-components formula passed through to statsmodels refits (3-level models only; ignored for CrossedLMEResult).
optimizer – Optimizer used for each case-deletion refit. "lbfgsb" (default) uses L-BFGS-B via scipy. "bobyqa" uses pybobyqa and routes single-RE statsmodels refits through interlace REML, which is more robust near variance-parameter boundaries and reduces the Cook’s D gap relative to R/HLMdiag. Requires the bobyqa optional extra when set to "bobyqa".
n_jobs – Number of parallel worker processes for case-deletion refits. 1 (default) runs sequentially. -1 uses all available CPUs (os.cpu_count()). Values > 1 are used as-is. Parallelism is only applied on the CrossedLMEResult path; statsmodels refits always run sequentially. On Linux, workers are forked (fast startup); on macOS/Windows, they are spawned (slower startup — parallelism helps mainly when n ≳ 500).
show_progress – Show a tqdm progress bar. Default: False.

Return type:

Any

Returns:

Native DataFrame (pandas, polars, …) in the same type as the model input. – Columns: cooksd, mdffits, covtrace, covratio, rvc.<name> for each variance component.

Parameters:

model (Any)
level (int | str)
vc_formula (Any)
optimizer (str)
n_jobs (int)
show_progress (bool)

Examples

>>> infl = interlace.hlm_influence(result)
>>> infl["cooksd"]

Returned columns¶

Column	Description
`cooksd`	Cook’s distance
`mdffits`	MDFFITS (Measures of Difference in Fixed Effects)
`covtrace`	COVTRACE (trace of V⁻¹Vᵢ) − p
`covratio`	COVRATIO (det(Vᵢ) / det(V))
`rvc.<name>`	Relative variance change per variance component

Convenience wrappers¶

cooks_distance(model, optimizer='lbfgsb')[source]¶

Return Cook’s distance for each observation.

Parameters:

model – A CrossedLMEResult or statsmodels MixedLMResults object.
optimizer – Optimizer used for case-deletion refits. See hlm_influence().

Return type:

ndarray

Returns:

np.ndarray of shape (n,) – Cook’s distance for each of the n observations.

Parameters:

model (Any)
optimizer (str)

Examples

>>> cd = interlace.cooks_distance(result)
>>> cd.max()

mdffits(model, optimizer='lbfgsb')[source]¶

Return MDFFITS for each observation.

Parameters:

model – A CrossedLMEResult or statsmodels MixedLMResults object.
optimizer – Optimizer used for case-deletion refits. See hlm_influence().

Return type:

ndarray

Returns:

np.ndarray of shape (n,) – MDFFITS value for each of the n observations.

Parameters:

model (Any)
optimizer (str)

Examples

>>> mdf = interlace.mdffits(result)
>>> mdf.max()

n_influential(model, threshold=None, optimizer='lbfgsb')[source]¶

Count observations whose Cook’s distance exceeds threshold.

Parameters:

model – A CrossedLMEResult or statsmodels MixedLMResults object.
threshold – Cut-off value. Defaults to 4 / n where n is the number of observations (the standard heuristic).
optimizer – Optimizer used for case-deletion refits. See hlm_influence().

Return type:

int

Returns:

int – Number of observations exceeding the threshold.

Parameters:

model (Any)
threshold (float | None)
optimizer (str)

Examples

>>> interlace.n_influential(result)
>>> interlace.n_influential(result, threshold=0.1)

Note

tau_gap() was removed from interlace in v0.2.9. It is a GPG-domain metric and has moved to gpgap.diagnostics.tau_gap().

Example¶

import interlace

result = interlace.fit("rt ~ condition", data=df, groups=["subject", "item"])

# Full influence frame
infl = interlace.hlm_influence(result)
print(infl.columns.tolist())
# ['cooksd', 'mdffits', 'covtrace', 'covratio', 'rvc.subject', 'rvc.item']

# Flag influential observations (Cook's D threshold: 4/n)
n = result.nobs
flagged = infl[infl["cooksd"] > 4 / n]
print(f"{len(flagged)} influential observations by Cook's D")

# Convenience: count influential at a given threshold
print(interlace.n_influential(result, threshold=4 / n))

# Better parity with R/HLMdiag (requires bobyqa extra)
infl_bobyqa = interlace.hlm_influence(result, optimizer="bobyqa")

Influence diagnostics¶

hlm_influence¶

Returned columns¶

Convenience wrappers¶

Example¶

See also¶