Longitudinal Data Guide

Longitudinal (repeated-measures) data have multiple observations per subject over time. A standard random-intercept LMM accounts for between-subject differences, but assumes that within-subject residuals are independent. When measurements close in time are more correlated than measurements far apart, that assumption fails — and the fix is a residual correlation structure.

interlace provides two structures via the correlation parameter on fit(): AR(1) and compound symmetry (CS). Both match R’s nlme package (corAR1 and corCompSymm).


When do you need a correlation structure?

A random intercept already captures some within-subject dependence — all observations from the same subject share a common shift. But it implies that any two observations from the same subject are equally correlated, regardless of how far apart in time they are.

Signs that you need more:

  • Residual autocorrelation — after fitting a random-intercept model, plot the residuals by time within subject. If early residuals predict later ones, AR(1) may help.

  • Model comparison favours it — AIC/BIC drops when you add a correlation structure.

  • Domain knowledge — the outcome has temporal inertia (e.g. blood pressure, growth curves, daily mood ratings).

If within-subject observations are exchangeable (e.g. test items within a session, raters scoring the same stimulus), a random intercept is usually sufficient.


AR(1) — first-order autoregressive

What it models

AR(1) assumes that residuals within a group decay in correlation exponentially with the time gap:

Cor(e_i, e_j) = rho^|t_i - t_j|
  • rho close to 1 — strong serial dependence; nearby measurements are highly correlated

  • rho close to 0 — no serial dependence; residuals are nearly independent

  • rho is always estimated in (-1, 1)

This is equivalent to a continuous-time AR(1) (Ornstein-Uhlenbeck) process, so it handles unequally spaced time points naturally.

Example

import interlace
from interlace import AR1

result = interlace.fit(
    "y ~ time + treatment",
    data=df,
    groups="subject",
    correlation=AR1(time="time"),
)

print(result.correlation_params)
# {'rho': 0.72}

print(result.variance_components)
print(result.fe_params)

When to use AR(1)

  • Measurements are taken at multiple time points per subject

  • The correlation between measurements decays with temporal distance

  • Data may have unequal spacing between time points (AR(1) handles this; CS does not distinguish time gaps)


Compound symmetry (exchangeable)

What it models

Compound symmetry (CS) assumes that all pairs of residuals within a group share the same correlation, regardless of time gap:

Cor(e_i, e_j) = rho   for all i != j within the same group

The within-group covariance matrix is:

R_g = sigma^2 * [(1 - rho) I + rho J]

where J is the all-ones matrix.

Example

from interlace import CompoundSymmetry

result = interlace.fit(
    "y ~ time + treatment",
    data=df,
    groups="subject",
    correlation=CompoundSymmetry(time="time"),
)

print(result.correlation_params)
# {'rho': 0.45}

When to use CS

  • Repeated measures are approximately exchangeable — the within-subject correlation does not depend on how far apart measurements are

  • You want to model a residual intraclass correlation beyond what the random intercept captures

  • As a reference model to compare against AR(1) via AIC/BIC


AR(1) vs compound symmetry vs random slopes

Three approaches to within-subject dependence, in increasing complexity:

Approach

What it captures

When to use

Random intercept only (groups=)

Subjects differ in their baseline

Default starting point

Compound symmetry (CompoundSymmetry)

Subjects have equal residual correlation for all measurement pairs

Exchangeable repeated measures

AR(1) (AR1)

Residual correlation decays with time gap

Temporal dependence

Random slope (random=["(1+time|subject)"])

Subjects differ in their trajectory over time

Subject-specific growth curves

These are not mutually exclusive — you can combine a random intercept with AR(1) correlation:

# Random intercept + AR(1) residual correlation
result = interlace.fit(
    "y ~ time + treatment",
    data=df,
    groups="subject",
    correlation=AR1(time="time"),
)

The random intercept captures between-subject variability; the AR(1) captures within-subject serial dependence in the residuals.

Choosing between them

Step 1: Fit a random-intercept model (no correlation structure).

Step 2: Check residual plots for autocorrelation. If residuals within subjects show temporal patterns, proceed.

Step 3: Fit AR(1) and CS models and compare:

from interlace import AR1, CompoundSymmetry

m_base = interlace.fit("y ~ time + treatment", data=df, groups="subject")
m_ar1  = interlace.fit("y ~ time + treatment", data=df, groups="subject",
                        correlation=AR1(time="time"))
m_cs   = interlace.fit("y ~ time + treatment", data=df, groups="subject",
                        correlation=CompoundSymmetry(time="time"))

print(f"Base AIC: {m_base.aic:.1f}")
print(f"AR(1) AIC: {m_ar1.aic:.1f}")
print(f"CS AIC:   {m_cs.aic:.1f}")

Pick the model with the lowest AIC. If AR(1) and CS give similar AIC, prefer the simpler model (CS has the same number of parameters as AR(1), so compare directly).

Step 4: If neither helps much, consider a random slope instead — it models between-subject differences in trajectory, which is a different (and sometimes more important) source of dependence.


How it works internally

The correlation structure operates by whitening the data before the standard profiled-REML machinery runs. For a given rho:

  1. Within each group, compute the Cholesky factor L of the correlation matrix R_g

  2. Transform y, X, and Z: multiply by L^{-1} within each group

  3. Run profiled REML on the whitened data (residuals are now independent)

The correlation parameter rho is estimated jointly with the variance parameters (theta) in the outer optimisation loop.


Comparison with R (nlme)

interlace

R (nlme)

fit(..., correlation=AR1(time="time"))

lme(..., correlation=corAR1(form=~time|subject))

fit(..., correlation=CompoundSymmetry(time="time"))

lme(..., correlation=corCompSymm(form=~time|subject))

result.correlation_params["rho"]

coef(fit$modelStruct$corStruct, unconstrained=FALSE)

Parity: fixed effects and variance components match R’s nlme::lme() within 1e-3 (absolute) and 5% (relative). The correlation parameter (rho) matches within 2% (relative) on the AR(1) and CS benchmark datasets.


Tips

  • Sort your data by time within subject before fitting. interlace does this internally, but sorting yourself makes residual plots easier to interpret.

  • AR(1) with unequal spacing works correctly — the time gaps between consecutive observations are computed from the time column, not assumed to be 1.

  • Negative rho is allowed but rare in practice. A negative AR(1) parameter means consecutive measurements tend to alternate above and below the mean.

  • CS + random intercept — compound symmetry adds residual correlation on top of the random intercept. If both are positive, the total within-subject correlation is ICC + rho*(1 - ICC), which can be substantial.


See also