{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a0000001",
   "metadata": {},
   "source": [
    "# Interpreting model results\n",
    "\n",
    "After calling `interlace.fit()`, you receive a `CrossedLMEResult` object. This page\n",
    "explains what each part of the output means and how to use it to draw conclusions from\n",
    "your model.\n",
    "\n",
    "Throughout this page we use a running example: exam scores modelled as a function of\n",
    "hours studied and prior GPA, with students and schools as crossed grouping factors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000002",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import interlace\n",
    "\n",
    "rng = np.random.default_rng(42)\n",
    "n = 300\n",
    "\n",
    "df = pd.DataFrame({\n",
    "    \"score\":         60 + rng.normal(0, 8, n),\n",
    "    \"hours_studied\": rng.uniform(0, 10, n),\n",
    "    \"prior_gpa\":     rng.uniform(2.0, 4.0, n),\n",
    "    \"student_id\":    [f\"s{i}\"   for i in rng.integers(1, 31, n)],\n",
    "    \"school_id\":     [f\"sch{i}\" for i in rng.integers(1, 11, n)],\n",
    "})\n",
    "\n",
    "result = interlace.fit(\n",
    "    \"score ~ hours_studied + prior_gpa\",\n",
    "    data=df,\n",
    "    groups=[\"student_id\", \"school_id\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000003",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Fixed effects\n",
    "\n",
    "Fixed effects capture the population-level relationships between predictors and the\n",
    "outcome, controlling for the grouping structure.\n",
    "\n",
    "The result exposes four attributes for fixed-effect inference:\n",
    "\n",
    "| Attribute | Contents |\n",
    "|---|---|\n",
    "| `fe_params` | Coefficient estimates (`pd.Series`) |\n",
    "| `fe_bse` | Standard errors |\n",
    "| `fe_pvalues` | Two-sided p-values (normal approximation) |\n",
    "| `fe_conf_int` | 95% confidence intervals (`pd.DataFrame`, columns `[0.025, 0.975]`) |\n",
    "\n",
    "### Reading the coefficients\n",
    "\n",
    "`fe_params` is a `pandas.Series` indexed by term name. Each coefficient is the\n",
    "expected change in the outcome per unit increase in the predictor, **holding grouping\n",
    "structure constant** \u2014 i.e., the within-group effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000004",
   "metadata": {},
   "outputs": [],
   "source": [
    "result.fe_params"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000005",
   "metadata": {},
   "source": [
    "### Standard errors and inference\n",
    "\n",
    "Standard errors in `fe_bse` account for the clustering: they are derived from the\n",
    "fixed-effect covariance matrix `fe_cov = scale * (X'\u03a9\u207b\u00b9X)\u207b\u00b9`, where X is the\n",
    "fixed-effect design matrix (one row per observation, one column per predictor including\n",
    "the intercept) and \u03a9 is the marginal covariance matrix of the response that encodes the\n",
    "random effect structure. This is why they are typically larger than OLS standard errors\n",
    "on the same data \u2014 the effective sample size is reduced by the grouping.\n",
    "\n",
    "A convenient summary table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000006",
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame({\n",
    "    \"estimate\": result.fe_params,\n",
    "    \"se\":       result.fe_bse,\n",
    "    \"p\":        result.fe_pvalues,\n",
    "    \"ci_lo\":    result.fe_conf_int.iloc[:, 0],\n",
    "    \"ci_hi\":    result.fe_conf_int.iloc[:, 1],\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000007",
   "metadata": {},
   "source": [
    "P-values use a normal approximation (z-test), which is appropriate for large samples.\n",
    "For small samples with few groups, consider using likelihood-ratio tests or\n",
    "parametric bootstrap instead.\n",
    "\n",
    "### Comparing models with different fixed structures\n",
    "\n",
    "To test whether a predictor improves fit, refit with `method=\"ML\"` (not REML) and\n",
    "compare log-likelihoods. Use REML (the default) for your final reported estimates;\n",
    "use ML only for likelihood-ratio tests comparing fixed structures. See\n",
    "[Concepts: REML](concepts.md#reml-restricted-maximum-likelihood) for the reason."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000008",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.stats import chi2\n",
    "\n",
    "m1 = interlace.fit(\"score ~ hours_studied\",              data=df, groups=[\"student_id\", \"school_id\"], method=\"ML\")\n",
    "m2 = interlace.fit(\"score ~ hours_studied + prior_gpa\", data=df, groups=[\"student_id\", \"school_id\"], method=\"ML\")\n",
    "\n",
    "lr_stat = 2 * (m2.llf - m1.llf)\n",
    "df_diff = 1  # one extra parameter in m2\n",
    "p_value = chi2.sf(lr_stat, df=df_diff)\n",
    "\n",
    "print(f\"LR statistic: {lr_stat:.3f}  (df={df_diff})\")\n",
    "print(f\"p-value:      {p_value:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000009",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Variance components\n",
    "\n",
    "Variance components quantify how much variability in the outcome is attributable to\n",
    "each grouping factor and to residual within-group noise.\n",
    "\n",
    "### Reading variance components"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000010",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Between-group variances:\")\n",
    "for g, v in result.variance_components.items():\n",
    "    print(f\"  {g}: \u03c3\u00b2 = {v:.3f}  (\u03c3 \u2248 {v**0.5:.2f} score points)\")\n",
    "\n",
    "print(f\"\\nResidual variance (scale): \u03c3\u00b2 = {result.scale:.3f}  (\u03c3 \u2248 {result.scale**0.5:.2f} score points)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000011",
   "metadata": {},
   "source": [
    "### Intraclass correlation (ICC)\n",
    "\n",
    "The ICC tells you what fraction of total variance each grouping factor explains.\n",
    "An ICC below ~0.05 suggests the grouping factor contributes little and may not need\n",
    "a random intercept."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000012",
   "metadata": {},
   "outputs": [],
   "source": [
    "total = sum(result.variance_components.values()) + result.scale\n",
    "\n",
    "icc = {g: v / total for g, v in result.variance_components.items()}\n",
    "\n",
    "for g, v in icc.items():\n",
    "    print(f\"ICC({g}): {v:.3f}  \u2192  {v*100:.1f}% of variance explained by {g}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000013",
   "metadata": {},
   "source": [
    "### What if a variance component is near zero?\n",
    "\n",
    "A variance component close to zero (sometimes reported as exactly 0 at the boundary of\n",
    "the parameter space) means the corresponding grouping factor explains almost no\n",
    "variance in the outcome after conditioning on the fixed effects. This is not an error \u2014\n",
    "it is a valid result indicating that the groups in that factor are essentially\n",
    "homogeneous. You may choose to drop that grouping factor and refit, or retain it for\n",
    "theoretical reasons.\n",
    "\n",
    "---\n",
    "\n",
    "## Model fit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000014",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"AIC:       {result.aic:.2f}\")\n",
    "print(f\"BIC:       {result.bic:.2f}\")\n",
    "print(f\"log-lik:   {result.llf:.2f}\")\n",
    "print(f\"converged: {result.converged}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000015",
   "metadata": {},
   "source": [
    "### AIC and BIC\n",
    "\n",
    "AIC and BIC are used to compare models with the **same fixed structure** fitted with\n",
    "REML, or models with **different fixed structures** fitted with ML. Both penalise\n",
    "model complexity; BIC penalises more heavily for large samples.\n",
    "\n",
    "AIC is defined as \u22122 log *L* + 2*k* (Akaike, 1974); BIC as \u22122 log *L* + *k* log *n*\n",
    "(Schwarz, 1978), where *k* is the number of estimated parameters and *n* the number of\n",
    "observations.\n",
    "\n",
    "A difference of \u2265 4 AIC points between models is often treated as meaningful; a\n",
    "difference of 1\u20133 is weak evidence. These are heuristics \u2014 use them alongside\n",
    "subject-matter reasoning, not as automatic decision rules.\n",
    "\n",
    "**References**\n",
    "\n",
    "Akaike, H. (1974). A new look at the statistical model identification.\n",
    "*IEEE Transactions on Automatic Control*, 19(6), 716\u2013723.\n",
    "[doi:10.1109/TAC.1974.1100705](https://doi.org/10.1109/TAC.1974.1100705)\n",
    "\n",
    "Schwarz, G. (1978). Estimating the dimension of a model.\n",
    "*The Annals of Statistics*, 6(2), 461\u2013464.\n",
    "[doi:10.1214/aos/1176344136](https://doi.org/10.1214/aos/1176344136)\n",
    "\n",
    "### Checking convergence\n",
    "\n",
    "Always verify convergence before reporting results. Non-convergence typically occurs\n",
    "with very small datasets, degenerate variance (one group per level), or near-zero\n",
    "variance components."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000016",
   "metadata": {},
   "outputs": [],
   "source": [
    "if not result.converged:\n",
    "    print(\"Warning: optimiser did not converge \u2014 estimates may be unreliable\")\n",
    "else:\n",
    "    print(\"Converged successfully.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000017",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Random effects (BLUPs)\n",
    "\n",
    "BLUPs (Best Linear Unbiased Predictors) are the estimated deviations of each group\n",
    "from the population mean. They are accessible per grouping factor as a dict of\n",
    "`pd.Series` (intercept-only models) or `pd.DataFrame` (random slopes models).\n",
    "\n",
    "### Reading BLUPs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000018",
   "metadata": {},
   "outputs": [],
   "source": [
    "student_blups = result.random_effects[\"student_id\"]\n",
    "school_blups  = result.random_effects[\"school_id\"]\n",
    "\n",
    "student_blups.sort_values()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000019",
   "metadata": {},
   "source": [
    "Each value is the estimated deviation of that group from the population mean, in the\n",
    "same units as the outcome (exam score points). A positive BLUP means the group scores\n",
    "above average after accounting for the fixed predictors and other grouping factors.\n",
    "\n",
    "For random slopes models, `random_effects[group]` is a `pd.DataFrame` with one column\n",
    "per term (intercept + slope).\n",
    "\n",
    "### Shrinkage\n",
    "\n",
    "BLUPs are **shrunk toward zero** relative to the group's raw deviation. Groups with\n",
    "fewer observations or groups whose variance component is small are shrunk more. This\n",
    "is intentional: a student with only one observation should not be assigned an extreme\n",
    "intercept \u2014 the BLUP borrows strength from the population distribution.\n",
    "\n",
    "The plot below makes shrinkage visible: each point is one student. Points above the\n",
    "diagonal are shrunk toward zero; the further from the line, the stronger the\n",
    "shrinkage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000020",
   "metadata": {},
   "outputs": [],
   "source": [
    "from plotnine import ggplot, aes, geom_point, geom_abline, labs, theme_bw\n",
    "\n",
    "raw = df.assign(resid=result.resid).groupby(\"student_id\")[\"resid\"].mean()\n",
    "\n",
    "shrinkage_df = pd.DataFrame({\n",
    "    \"raw_mean_resid\": raw,\n",
    "    \"blup\":           student_blups.reindex(raw.index),\n",
    "}).reset_index()\n",
    "\n",
    "(\n",
    "    ggplot(shrinkage_df, aes(x=\"raw_mean_resid\", y=\"blup\"))\n",
    "    + geom_abline(slope=1, intercept=0, linetype=\"dashed\", color=\"grey\")\n",
    "    + geom_point(alpha=0.7)\n",
    "    + labs(\n",
    "        x=\"Raw group mean residual (no shrinkage)\",\n",
    "        y=\"BLUP (shrunk estimate)\",\n",
    "        title=\"Shrinkage: BLUPs vs raw group means\",\n",
    "    )\n",
    "    + theme_bw()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000021",
   "metadata": {},
   "source": [
    "### BLUPs vs fixed group dummies\n",
    "\n",
    "BLUPs are not the same as including a fixed dummy variable for each group. Dummies\n",
    "use only the data from that group; BLUPs partially pool across groups. For groups with\n",
    "very many observations, BLUPs converge to the dummy-variable estimate. For small or\n",
    "sparse groups, BLUPs are much more stable.\n",
    "\n",
    "---\n",
    "\n",
    "## Predictions\n",
    "\n",
    "`result.predict()` returns fitted values, optionally including random effects.\n",
    "\n",
    "| | Conditional (`include_re=True`) | Marginal (`include_re=False`) |\n",
    "|---|---|---|\n",
    "| Includes BLUPs | Yes | No |\n",
    "| Use case | In-sample fit, known groups | New groups, population-level |\n",
    "| Unseen group levels | BLUP = 0 (shrinks to mean) | Identical to marginal |\n",
    "\n",
    "**Conditional predictions** are appropriate when you know which group an observation\n",
    "belongs to and want the best estimate for that specific group. They are used to\n",
    "compute residuals (`result.resid = observed \u2212 conditional prediction`).\n",
    "\n",
    "**Marginal predictions** give the expected outcome for a hypothetical average group\n",
    "\u2014 useful for reporting population-level effects or when the new data includes groups\n",
    "not seen during training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000022",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Conditional predictions (uses BLUPs, default)\n",
    "y_conditional = result.predict()\n",
    "\n",
    "# Marginal predictions (fixed effects only)\n",
    "y_marginal = result.predict(include_re=False)\n",
    "\n",
    "print(f\"Conditional predictions \u2014 mean: {y_conditional.mean():.2f}, std: {y_conditional.std():.2f}\")\n",
    "print(f\"Marginal predictions    \u2014 mean: {y_marginal.mean():.2f}, std: {y_marginal.std():.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000023",
   "metadata": {},
   "source": [
    "### Predicting on new data\n",
    "\n",
    "Unseen group levels automatically receive a BLUP of zero \u2014 the population mean for\n",
    "that factor. This is the correct Bayesian-optimal behaviour under the model's\n",
    "assumptions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000024",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_new = pd.DataFrame({\n",
    "    \"hours_studied\": [5.0, 8.0, 5.0],\n",
    "    \"prior_gpa\":     [3.2, 3.8, 3.2],\n",
    "    \"student_id\":    [\"s1\", \"s2\", \"s_new\"],   # s_new is unseen \u2192 BLUP = 0\n",
    "    \"school_id\":     [\"sch1\", \"sch1\", \"sch1\"],\n",
    "})\n",
    "\n",
    "preds = result.predict(newdata=df_new)\n",
    "\n",
    "df_new.assign(predicted_score=preds.round(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000025",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Bootstrap standard error\n",
    "\n",
    "For scalar statistics of the outcome (currently: median), `bootstrap_se()` computes\n",
    "a cluster-bootstrap standard error that respects the grouping structure.\n",
    "\n",
    "Ordinary bootstrap resamples individual observations, which underestimates the SE when\n",
    "group variance is substantial. Cluster bootstrap resamples **group levels** with\n",
    "replacement, then includes all observations from the sampled groups \u2014 preserving\n",
    "within-group correlation.\n",
    "\n",
    "Use `resample_level=\"observation\"` only if you have verified that the ICC is negligible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0000026",
   "metadata": {},
   "outputs": [],
   "source": [
    "se_cluster     = result.bootstrap_se(statistic=\"median\", n_bootstrap=500, resample_level=\"group\",       seed=42)\n",
    "se_observation = result.bootstrap_se(statistic=\"median\", n_bootstrap=500, resample_level=\"observation\", seed=42)\n",
    "\n",
    "print(f\"Median: {np.median(df['score']):.2f}\")\n",
    "print(f\"Cluster-bootstrap SE:     {se_cluster:.4f}\")\n",
    "print(f\"Observation-bootstrap SE: {se_observation:.4f}  (underestimates when ICC > 0)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0000027",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Quick reference\n",
    "\n",
    "| Attribute | Type | What it tells you |\n",
    "|---|---|---|\n",
    "| `fe_params` | `pd.Series` | Fixed-effect coefficient estimates |\n",
    "| `fe_bse` | `pd.Series` | Standard errors of fixed effects |\n",
    "| `fe_pvalues` | `pd.Series` | Two-sided p-values (z-test) |\n",
    "| `fe_conf_int` | `pd.DataFrame` | 95% confidence intervals |\n",
    "| `fe_cov` | `np.ndarray` | Fixed-effect covariance matrix |\n",
    "| `variance_components` | `dict` | Between-group variance per factor |\n",
    "| `scale` | `float` | Within-group (residual) variance \u03c3\u00b2 |\n",
    "| `random_effects` | `dict` | BLUPs per grouping factor |\n",
    "| `fittedvalues` | `np.ndarray` | Conditional fitted values |\n",
    "| `resid` | `np.ndarray` | Conditional residuals |\n",
    "| `aic` | `float` | Akaike Information Criterion |\n",
    "| `bic` | `float` | Bayesian Information Criterion |\n",
    "| `llf` | `float` | Log-likelihood at optimum |\n",
    "| `converged` | `bool` | Whether the optimiser converged |\n",
    "| `nobs` | `int` | Number of observations |\n",
    "| `ngroups` | `dict` | Number of levels per grouping factor |\n",
    "\n",
    "---\n",
    "\n",
    "## Where to go next\n",
    "\n",
    "- [Diagnostics guide](diagnostics.ipynb) \u2014 residuals, leverage, Cook's D, influence plots\n",
    "- [API: CrossedLMEResult](api/result.md) \u2014 full attribute and method reference\n",
    "- [API: Prediction](api/predict.md) \u2014 `predict()` parameter details\n",
    "- [Concepts](concepts.md) \u2014 background on REML, BLUPs, and variance components"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.13.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}