glmer¶

Fit a generalised linear mixed model (GLMM) via Laplace approximation or adaptive Gauss-Hermite quadrature (AGQ) with penalised iteratively reweighted least squares (PIRLS) — targeting parity with R’s lme4::glmer().

glmer(formula, data, family, groups=None, random=None, weights=None, optimizer='lbfgsb', theta0=None, nAGQ=1, offset=None, dispformula=None)¶

Fit a generalized linear mixed model.

Parameters:

formula – Fixed-effects formula, e.g. "y ~ x1 + x2".
data – DataFrame (pandas, polars, or any narwhals-compatible frame).
family – Family name ("binomial", "poisson", "gaussian") or a GLMMFamily instance.
groups – Column name(s) for random intercepts.
random – lme4-style random effect specs (takes precedence over groups).
weights – Prior weights. For binomial proportion response, pass trial counts. Defaults to ones.
optimizer – "lbfgsb" (default) or "bobyqa".
theta0 – Initial theta. Defaults to ones.
nAGQ – Number of adaptive Gauss-Hermite quadrature points. 1 (default) uses the Laplace approximation. Values > 1 use AGQ for a more accurate marginal-likelihood integral; this requires a single scalar random intercept (one grouping factor, intercept only).
offset – Offset vector, shape (n,). A known term added to the linear predictor that is not estimated. Common use: np.log(exposure) in Poisson rate models. Defaults to zero.
dispformula – Formula for the dispersion sub-model with a log link, e.g. "~1" for scalar dispersion or "~ z" for covariate-dependent dispersion. The dispersion for observation i is phi_i = exp(X_d[i] @ delta). None (default) fixes dispersion at 1.0. Cannot be combined with nAGQ > 1.

Return type:

GLMMResult

Returns:

GLMMResult – Fitted GLMM result containing fixed-effect estimates (fe_params, fe_bse), BLUPs (random_effects), variance components (variance_components), in-sample fitted values (fittedvalues), log-likelihood (llf), information criteria (aic, bic), and convergence status (converged).

Parameters:

formula (str)
data (Any)
family (str | GLMMFamily)
groups (str | list[str] | None)
random (list[str] | None)
weights (ndarray | None)
optimizer (str)
theta0 (ndarray | None)
nAGQ (int)
offset (ndarray | None)
dispformula (str | None)

Examples

>>> import numpy as np, pandas as pd, interlace
>>> rng = np.random.default_rng(0)
>>> df = pd.DataFrame({"y": rng.binomial(10, 0.4, 200),
...                    "n": np.full(200, 10),
...                    "x": rng.normal(size=200),
...                    "g": rng.integers(0, 20, 200)})
>>> result = interlace.glmer("cbind(y, n-y) ~ x", df,
...                          family="binomial", groups="g")
>>> result.fe_params
Intercept    ...
x            ...
dtype: float64

Key parameters¶

Parameter	Type	Description
`formula`	`str`	Fixed-effects formula in Wilkinson notation (e.g. `"y ~ x + z"`)
`data`	`DataFrame`	Input data (pandas, polars, or any narwhals-compatible frame)
`family`	`str \| GLMMFamily`	Response distribution: `"binomial"`, `"poisson"`, `"gaussian"`, or `"negativebinomial"`
`groups`	`str \| list[str]`	Column name(s) for random intercepts (shorthand)
`random`	`list[str]`	lme4-style random-effect specs, e.g. `["(1 \| herd)"]`
`weights`	`ndarray`	Prior weights; for binomial proportion responses, pass trial counts
`optimizer`	`"lbfgsb"` or `"bobyqa"`	Optimizer. `"bobyqa"` is gradient-free and more robust
`theta0`	`ndarray`	Initial variance parameters; defaults to ones
`nAGQ`	`int`	Number of adaptive Gauss-Hermite quadrature points (default `1` = Laplace)
`offset`	`ndarray`	Known term added to the linear predictor (e.g. `np.log(exposure)` for Poisson rates)
`dispformula`	`str`	Formula for the dispersion sub-model with log link (e.g. `"~1"`, `"~ z"`)

Supported families¶

String shorthand¶

Family	Link function	Use case
`"binomial"`	logit	Binary outcomes, proportions (pass trial counts via `weights`)
`"poisson"`	log	Count data (events, errors, frequencies)
`"gaussian"`	identity	Continuous outcomes (equivalent to `fit()` with identity link)
`"negativebinomial"`	log	Overdispersed count data (variance = mu + mu^2/theta)

Extended families (pass instances)¶

Family class	Link	Use case
`NegativeBinomial1Family(alpha=)`	log	Overdispersed counts (linear variance)
`BetaFamily(phi=)`	logit	Continuous proportions in (0, 1)
`GammaFamily(link=)`	log / inverse	Positive continuous data
`ZeroInflatedPoissonFamily(pi=)`	log	Counts with excess zeros
`ZeroInflatedNB2Family(pi=, theta=)`	log	Overdispersed counts with excess zeros
`HurdlePoissonFamily()`	log	Counts with structural zeros
`ZeroOneInflatedBetaFamily(pi0=, pi1=, phi=)`	logit	Proportions with exact 0s and 1s

See GLMM families for constructor parameters, examples, and the GLMMFamily protocol for building custom families.

Examples¶

Binomial GLMM (proportion response)¶

The classic CBPP dataset: disease incidence across cattle herds.

import interlace
import pandas as pd

df = pd.read_csv("cbpp.csv")

result = interlace.glmer(
    formula="incidence / size ~ period",
    data=df,
    family="binomial",
    groups="herd",
    weights=df["size"].values,
)

print(result.fe_params)       # fixed-effect log-odds
print(result.variance_components)  # between-herd variance
print(result.aic, result.bic)

Poisson GLMM (count response)¶

result = interlace.glmer(
    formula="count ~ x1 + x2",
    data=df,
    family="poisson",
    groups=["site", "year"],
)

print(result.fe_params)       # fixed-effect log-rates
print(result.random_effects["site"])

Negative Binomial GLMM (overdispersed counts)¶

For count data with overdispersion (variance > mean), use the negative binomial family. Pass a NegativeBinomial2Family instance with the shape parameter theta:

from interlace import NegativeBinomial2Family

result = interlace.glmer(
    formula="count ~ x1 + x2",
    data=df,
    family=NegativeBinomial2Family(theta=2.0),
    groups="site",
)

print(result.fe_params)       # fixed-effect log-rates
print(result.variance_components)

You can also use the string shorthand "negativebinomial" which defaults to theta=1.0:

result = interlace.glmer(
    formula="count ~ x1 + x2",
    data=df,
    family="negativebinomial",
    groups="site",
)

Using `random=` for lme4-style syntax¶

result = interlace.glmer(
    formula="incidence / size ~ period",
    data=df,
    family="binomial",
    random=["(1 | herd)"],
    weights=df["size"].values,
)

Adaptive Gauss-Hermite quadrature (AGQ)¶

For models with a single scalar random intercept, set nAGQ > 1 for a more accurate marginal-likelihood approximation than Laplace. This mirrors lme4::glmer(nAGQ = ...).

result = interlace.glmer(
    formula="incidence / size ~ period",
    data=df,
    family="binomial",
    groups="herd",
    weights=df["size"].values,
    nAGQ=25,  # 25 quadrature points
)

nAGQ=1 (the default) uses the Laplace approximation. Higher values are more accurate but slower. nAGQ > 1 requires exactly one grouping factor with a random intercept only (no random slopes). See the GLMM quickstart AGQ section for guidance on when Laplace is sufficient vs when AGQ improves estimation.

Dispersion modelling (`dispformula`)¶

By default, the dispersion (scale) parameter is fixed at 1.0 for binomial and Poisson families. For Gaussian GLMMs, you can estimate the residual variance or model heteroscedasticity using dispformula.

Scalar dispersion (~1): estimates a single residual variance $\phi = \exp(\delta_0)$.

result = interlace.glmer(
    formula="y ~ x1 + x2",
    data=df,
    family="gaussian",
    groups="site",
    dispformula="~1",
)

# Estimated residual variance
import numpy as np
phi_hat = np.exp(result.disp_params["Intercept"])
print(f"Residual variance: {phi_hat:.3f}")

Covariate-dependent dispersion (~ z): models observation-level variance as $\phi_i = \exp(\delta_0 + \delta_1 z_i)$ (heteroscedastic model).

result = interlace.glmer(
    formula="y ~ x1 + x2",
    data=df,
    family="gaussian",
    groups="site",
    dispformula="~ z",
)

print(result.disp_params)   # log-scale dispersion coefficients
print(result.dispersion)    # per-observation phi values

The dispersion sub-model coefficients are on the log scale (log link ensures $\phi > 0$). They are stored in result.disp_params as a pd.Series. The per-observation dispersion values are in result.dispersion.

dispformula cannot be combined with nAGQ > 1.

Prediction¶

# Response-scale predictions (probabilities for binomial, counts for Poisson)
preds = result.predict(newdata=df_new)

# Linear predictor (log-odds for binomial, log-rate for Poisson)
eta = result.predict(newdata=df_new, type="link")

# Population-level only (no random effects)
preds_marginal = result.predict(newdata=df_new, include_re=False)

Result object¶

glmer() returns a GLMMResult with the following attributes:

Attribute	Type	Description
`fe_params`	`pd.Series`	Fixed-effect coefficients
`fe_bse`	`pd.Series`	Fixed-effect standard errors
`random_effects`	`dict`	BLUPs per grouping factor
`variance_components`	`dict`	Variance estimates per grouping factor
`theta`	`ndarray`	Variance parameters (Cholesky factors)
`converged`	`bool`	Whether both PIRLS and outer optimizer converged
`nobs`	`int`	Number of observations
`llf`	`float`	Log-likelihood (Laplace or AGQ, depending on `nAGQ`)
`aic`	`float`	Akaike information criterion
`bic`	`float`	Bayesian information criterion
`family`	`GLMMFamily`	Family object used for fitting
`ngroups`	`dict`	Number of levels per grouping factor
`scale`	`float`	Dispersion parameter (1.0 for binomial and Poisson)
`fittedvalues`	`ndarray`	Fitted values on the response scale
`disp_params`	`pd.Series \| None`	Dispersion formula coefficients (log scale); `None` when no `dispformula`
`dispersion`	`ndarray \| None`	Per-observation dispersion values; `None` when no `dispformula`

Custom families¶

Any object implementing the GLMMFamily protocol can be passed as the family parameter:

from interlace import GLMMFamily

class MyFamily:
    name: str = "custom"

    def link(self, mu):       ...  # eta = g(mu)
    def linkinv(self, eta):   ...  # mu = g^{-1}(eta)
    def mu_eta(self, eta):    ...  # d(mu)/d(eta)
    def variance(self, mu):   ...  # Var(Y | mu)
    def dev_resids(self, y, mu, wt): ...  # deviance residuals