glmer¶
Fit a generalised linear mixed model (GLMM) via Laplace approximation or
adaptive Gauss-Hermite quadrature (AGQ) with penalised iteratively reweighted
least squares (PIRLS) — targeting parity with R’s
lme4::glmer().
- glmer(formula, data, family, groups=None, random=None, weights=None, optimizer='lbfgsb', theta0=None, nAGQ=1, offset=None, dispformula=None)¶
Fit a generalized linear mixed model.
- Parameters:
formula – Fixed-effects formula, e.g.
"y ~ x1 + x2".data – DataFrame (pandas, polars, or any narwhals-compatible frame).
family – Family name (
"binomial","poisson","gaussian") or aGLMMFamilyinstance.groups – Column name(s) for random intercepts.
random – lme4-style random effect specs (takes precedence over
groups).weights – Prior weights. For binomial proportion response, pass trial counts. Defaults to ones.
optimizer –
"lbfgsb"(default) or"bobyqa".theta0 – Initial theta. Defaults to ones.
nAGQ – Number of adaptive Gauss-Hermite quadrature points.
1(default) uses the Laplace approximation. Values> 1use AGQ for a more accurate marginal-likelihood integral; this requires a single scalar random intercept (one grouping factor, intercept only).offset – Offset vector, shape
(n,). A known term added to the linear predictor that is not estimated. Common use:np.log(exposure)in Poisson rate models. Defaults to zero.dispformula – Formula for the dispersion sub-model with a log link, e.g.
"~1"for scalar dispersion or"~ z"for covariate-dependent dispersion. The dispersion for observation i isphi_i = exp(X_d[i] @ delta).None(default) fixes dispersion at 1.0. Cannot be combined withnAGQ > 1.
- Return type:
GLMMResult- Returns:
GLMMResult – Fitted GLMM result containing fixed-effect estimates (
fe_params,fe_bse), BLUPs (random_effects), variance components (variance_components), in-sample fitted values (fittedvalues), log-likelihood (llf), information criteria (aic,bic), and convergence status (converged).- Parameters:
formula (str)
data (Any)
family (str | GLMMFamily)
groups (str | list[str] | None)
random (list[str] | None)
weights (ndarray | None)
optimizer (str)
theta0 (ndarray | None)
nAGQ (int)
offset (ndarray | None)
dispformula (str | None)
Examples
>>> import numpy as np, pandas as pd, interlace >>> rng = np.random.default_rng(0) >>> df = pd.DataFrame({"y": rng.binomial(10, 0.4, 200), ... "n": np.full(200, 10), ... "x": rng.normal(size=200), ... "g": rng.integers(0, 20, 200)}) >>> result = interlace.glmer("cbind(y, n-y) ~ x", df, ... family="binomial", groups="g") >>> result.fe_params Intercept ... x ... dtype: float64
Key parameters¶
Parameter |
Type |
Description |
|---|---|---|
|
|
Fixed-effects formula in Wilkinson notation (e.g. |
|
|
Input data (pandas, polars, or any narwhals-compatible frame) |
|
|
Response distribution: |
|
|
Column name(s) for random intercepts (shorthand) |
|
|
lme4-style random-effect specs, e.g. |
|
|
Prior weights; for binomial proportion responses, pass trial counts |
|
|
Optimizer. |
|
|
Initial variance parameters; defaults to ones |
|
|
Number of adaptive Gauss-Hermite quadrature points (default |
|
|
Known term added to the linear predictor (e.g. |
|
|
Formula for the dispersion sub-model with log link (e.g. |
Supported families¶
String shorthand¶
Family |
Link function |
Use case |
|---|---|---|
|
logit |
Binary outcomes, proportions (pass trial counts via |
|
log |
Count data (events, errors, frequencies) |
|
identity |
Continuous outcomes (equivalent to |
|
log |
Overdispersed count data (variance = mu + mu^2/theta) |
Extended families (pass instances)¶
Family class |
Link |
Use case |
|---|---|---|
|
log |
Overdispersed counts (linear variance) |
|
logit |
Continuous proportions in (0, 1) |
|
log / inverse |
Positive continuous data |
|
log |
Counts with excess zeros |
|
log |
Overdispersed counts with excess zeros |
|
log |
Counts with structural zeros |
|
logit |
Proportions with exact 0s and 1s |
See GLMM families for constructor parameters, examples, and the
GLMMFamily protocol for building custom families.
Examples¶
Binomial GLMM (proportion response)¶
The classic CBPP dataset: disease incidence across cattle herds.
import interlace
import pandas as pd
df = pd.read_csv("cbpp.csv")
result = interlace.glmer(
formula="incidence / size ~ period",
data=df,
family="binomial",
groups="herd",
weights=df["size"].values,
)
print(result.fe_params) # fixed-effect log-odds
print(result.variance_components) # between-herd variance
print(result.aic, result.bic)
Poisson GLMM (count response)¶
result = interlace.glmer(
formula="count ~ x1 + x2",
data=df,
family="poisson",
groups=["site", "year"],
)
print(result.fe_params) # fixed-effect log-rates
print(result.random_effects["site"])
Negative Binomial GLMM (overdispersed counts)¶
For count data with overdispersion (variance > mean), use the negative binomial
family. Pass a NegativeBinomial2Family instance with the shape parameter
theta:
from interlace import NegativeBinomial2Family
result = interlace.glmer(
formula="count ~ x1 + x2",
data=df,
family=NegativeBinomial2Family(theta=2.0),
groups="site",
)
print(result.fe_params) # fixed-effect log-rates
print(result.variance_components)
You can also use the string shorthand "negativebinomial" which defaults to
theta=1.0:
result = interlace.glmer(
formula="count ~ x1 + x2",
data=df,
family="negativebinomial",
groups="site",
)
Using random= for lme4-style syntax¶
result = interlace.glmer(
formula="incidence / size ~ period",
data=df,
family="binomial",
random=["(1 | herd)"],
weights=df["size"].values,
)
Adaptive Gauss-Hermite quadrature (AGQ)¶
For models with a single scalar random intercept, set nAGQ > 1 for a more
accurate marginal-likelihood approximation than Laplace. This mirrors
lme4::glmer(nAGQ = ...).
result = interlace.glmer(
formula="incidence / size ~ period",
data=df,
family="binomial",
groups="herd",
weights=df["size"].values,
nAGQ=25, # 25 quadrature points
)
nAGQ=1 (the default) uses the Laplace approximation. Higher values are more
accurate but slower. nAGQ > 1 requires exactly one grouping factor with a
random intercept only (no random slopes). See the
GLMM quickstart AGQ section for guidance on when Laplace is sufficient
vs when AGQ improves estimation.
Dispersion modelling (dispformula)¶
By default, the dispersion (scale) parameter is fixed at 1.0 for binomial and
Poisson families. For Gaussian GLMMs, you can estimate the residual variance
or model heteroscedasticity using dispformula.
Scalar dispersion (~1): estimates a single residual variance
$\phi = \exp(\delta_0)$.
result = interlace.glmer(
formula="y ~ x1 + x2",
data=df,
family="gaussian",
groups="site",
dispformula="~1",
)
# Estimated residual variance
import numpy as np
phi_hat = np.exp(result.disp_params["Intercept"])
print(f"Residual variance: {phi_hat:.3f}")
Covariate-dependent dispersion (~ z): models observation-level variance as
$\phi_i = \exp(\delta_0 + \delta_1 z_i)$ (heteroscedastic model).
result = interlace.glmer(
formula="y ~ x1 + x2",
data=df,
family="gaussian",
groups="site",
dispformula="~ z",
)
print(result.disp_params) # log-scale dispersion coefficients
print(result.dispersion) # per-observation phi values
The dispersion sub-model coefficients are on the log scale (log link ensures
$\phi > 0$). They are stored in result.disp_params as a pd.Series. The
per-observation dispersion values are in result.dispersion.
dispformula cannot be combined with nAGQ > 1.
Prediction¶
# Response-scale predictions (probabilities for binomial, counts for Poisson)
preds = result.predict(newdata=df_new)
# Linear predictor (log-odds for binomial, log-rate for Poisson)
eta = result.predict(newdata=df_new, type="link")
# Population-level only (no random effects)
preds_marginal = result.predict(newdata=df_new, include_re=False)
Result object¶
glmer() returns a GLMMResult with the following attributes:
Attribute |
Type |
Description |
|---|---|---|
|
|
Fixed-effect coefficients |
|
|
Fixed-effect standard errors |
|
|
BLUPs per grouping factor |
|
|
Variance estimates per grouping factor |
|
|
Variance parameters (Cholesky factors) |
|
|
Whether both PIRLS and outer optimizer converged |
|
|
Number of observations |
|
|
Log-likelihood (Laplace or AGQ, depending on |
|
|
Akaike information criterion |
|
|
Bayesian information criterion |
|
|
Family object used for fitting |
|
|
Number of levels per grouping factor |
|
|
Dispersion parameter (1.0 for binomial and Poisson) |
|
|
Fitted values on the response scale |
|
|
Dispersion formula coefficients (log scale); |
|
|
Per-observation dispersion values; |
Custom families¶
Any object implementing the GLMMFamily protocol can be passed as the
family parameter:
from interlace import GLMMFamily
class MyFamily:
name: str = "custom"
def link(self, mu): ... # eta = g(mu)
def linkinv(self, eta): ... # mu = g^{-1}(eta)
def mu_eta(self, eta): ... # d(mu)/d(eta)
def variance(self, mu): ... # Var(Y | mu)
def dev_resids(self, y, mu, wt): ... # deviance residuals
See also¶
fit — linear mixed models (
lmer-equivalent)CrossedLMEResult — attributes on
CrossedLMEResult(LMM result)Prediction — generating predictions from a fitted model