Longitudinal Data Guide¶
Longitudinal (repeated-measures) data have multiple observations per subject over time. A standard random-intercept LMM accounts for between-subject differences, but assumes that within-subject residuals are independent. When measurements close in time are more correlated than measurements far apart, that assumption fails — and the fix is a residual correlation structure.
interlace provides two structures via the correlation parameter on
fit(): AR(1) and compound symmetry (CS). Both match R’s nlme
package (corAR1 and corCompSymm).
When do you need a correlation structure?¶
A random intercept already captures some within-subject dependence — all observations from the same subject share a common shift. But it implies that any two observations from the same subject are equally correlated, regardless of how far apart in time they are.
Signs that you need more:
Residual autocorrelation — after fitting a random-intercept model, plot the residuals by time within subject. If early residuals predict later ones, AR(1) may help.
Model comparison favours it — AIC/BIC drops when you add a correlation structure.
Domain knowledge — the outcome has temporal inertia (e.g. blood pressure, growth curves, daily mood ratings).
If within-subject observations are exchangeable (e.g. test items within a session, raters scoring the same stimulus), a random intercept is usually sufficient.
AR(1) — first-order autoregressive¶
What it models¶
AR(1) assumes that residuals within a group decay in correlation exponentially with the time gap:
Cor(e_i, e_j) = rho^|t_i - t_j|
rho close to 1 — strong serial dependence; nearby measurements are highly correlated
rho close to 0 — no serial dependence; residuals are nearly independent
rho is always estimated in (-1, 1)
This is equivalent to a continuous-time AR(1) (Ornstein-Uhlenbeck) process, so it handles unequally spaced time points naturally.
Example¶
import interlace
from interlace import AR1
result = interlace.fit(
"y ~ time + treatment",
data=df,
groups="subject",
correlation=AR1(time="time"),
)
print(result.correlation_params)
# {'rho': 0.72}
print(result.variance_components)
print(result.fe_params)
When to use AR(1)¶
Measurements are taken at multiple time points per subject
The correlation between measurements decays with temporal distance
Data may have unequal spacing between time points (AR(1) handles this; CS does not distinguish time gaps)
Compound symmetry (exchangeable)¶
What it models¶
Compound symmetry (CS) assumes that all pairs of residuals within a group share the same correlation, regardless of time gap:
Cor(e_i, e_j) = rho for all i != j within the same group
The within-group covariance matrix is:
R_g = sigma^2 * [(1 - rho) I + rho J]
where J is the all-ones matrix.
Example¶
from interlace import CompoundSymmetry
result = interlace.fit(
"y ~ time + treatment",
data=df,
groups="subject",
correlation=CompoundSymmetry(time="time"),
)
print(result.correlation_params)
# {'rho': 0.45}
When to use CS¶
Repeated measures are approximately exchangeable — the within-subject correlation does not depend on how far apart measurements are
You want to model a residual intraclass correlation beyond what the random intercept captures
As a reference model to compare against AR(1) via AIC/BIC
AR(1) vs compound symmetry vs random slopes¶
Three approaches to within-subject dependence, in increasing complexity:
Approach |
What it captures |
When to use |
|---|---|---|
Random intercept only ( |
Subjects differ in their baseline |
Default starting point |
Compound symmetry ( |
Subjects have equal residual correlation for all measurement pairs |
Exchangeable repeated measures |
AR(1) ( |
Residual correlation decays with time gap |
Temporal dependence |
Random slope ( |
Subjects differ in their trajectory over time |
Subject-specific growth curves |
These are not mutually exclusive — you can combine a random intercept with AR(1) correlation:
# Random intercept + AR(1) residual correlation
result = interlace.fit(
"y ~ time + treatment",
data=df,
groups="subject",
correlation=AR1(time="time"),
)
The random intercept captures between-subject variability; the AR(1) captures within-subject serial dependence in the residuals.
Choosing between them¶
Step 1: Fit a random-intercept model (no correlation structure).
Step 2: Check residual plots for autocorrelation. If residuals within subjects show temporal patterns, proceed.
Step 3: Fit AR(1) and CS models and compare:
from interlace import AR1, CompoundSymmetry
m_base = interlace.fit("y ~ time + treatment", data=df, groups="subject")
m_ar1 = interlace.fit("y ~ time + treatment", data=df, groups="subject",
correlation=AR1(time="time"))
m_cs = interlace.fit("y ~ time + treatment", data=df, groups="subject",
correlation=CompoundSymmetry(time="time"))
print(f"Base AIC: {m_base.aic:.1f}")
print(f"AR(1) AIC: {m_ar1.aic:.1f}")
print(f"CS AIC: {m_cs.aic:.1f}")
Pick the model with the lowest AIC. If AR(1) and CS give similar AIC, prefer the simpler model (CS has the same number of parameters as AR(1), so compare directly).
Step 4: If neither helps much, consider a random slope instead — it models between-subject differences in trajectory, which is a different (and sometimes more important) source of dependence.
How it works internally¶
The correlation structure operates by whitening the data before the standard profiled-REML machinery runs. For a given rho:
Within each group, compute the Cholesky factor L of the correlation matrix R_g
Transform y, X, and Z: multiply by L^{-1} within each group
Run profiled REML on the whitened data (residuals are now independent)
The correlation parameter rho is estimated jointly with the variance parameters (theta) in the outer optimisation loop.
Comparison with R (nlme)¶
interlace |
R (nlme) |
|---|---|
|
|
|
|
|
|
Parity: fixed effects and variance components match R’s nlme::lme()
within 1e-3 (absolute) and 5% (relative). The correlation parameter (rho)
matches within 2% (relative) on the AR(1) and CS benchmark datasets.
Tips¶
Sort your data by time within subject before fitting.
interlacedoes this internally, but sorting yourself makes residual plots easier to interpret.AR(1) with unequal spacing works correctly — the time gaps between consecutive observations are computed from the
timecolumn, not assumed to be 1.Negative rho is allowed but rare in practice. A negative AR(1) parameter means consecutive measurements tend to alternate above and below the mean.
CS + random intercept — compound symmetry adds residual correlation on top of the random intercept. If both are positive, the total within-subject correlation is ICC + rho*(1 - ICC), which can be substantial.
See also¶
Correlation structures — full API reference for AR1 and CompoundSymmetry
Quickstart — basic
fit()workflowRandom Slopes Guide — random slopes as an alternative to correlation structures
Model Comparison — comparing models with different structures via AIC/LRT
fit — the
correlationparameter onfit()