Contributing¶

Contributions are welcome — whether that’s bug reports, documentation improvements, or new features. To get started, open an issue on the GitHub issue tracker or submit a pull request.

Development setup¶

make install      # create venv and install all dev deps via uv
make test         # run pytest
make lint         # ruff format + ruff check --fix
make typecheck    # mypy
make check        # lint + typecheck + test (full CI gate)

Workflow¶

This project follows strict TDD. Before writing any implementation code:

Pick an open issue from the issue tracker
Write a failing test in tests/ that captures the acceptance criteria
Run make test — confirm it fails for the right reason
Write the minimum implementation to make it pass
Run make check before submitting a pull request

Never write implementation code without a corresponding failing test driving it.

        flowchart TD
    A[Pick an open issue] --> B[Write failing test]
    B --> C{make test\nfails?}
    C -- No --> B
    C -- Yes --> D[Write minimum implementation]
    D --> E{make check\npasses?}
    E -- No --> D
    E -- Yes --> F[Open pull request]

Internal contributors use bd (the beads issue tracker) to manage tasks: bd ready lists unblocked issues, bd update <id> --claim claims one.

Validation against lme4¶

Numerical parity with R’s lme4::lmer() is enforced by the test suite on every CI run. Three scenarios are tested, each with a pre-computed oracle file generated from R:

Scenario	Test file	Groups	Obs
Single random intercept	`test_parity_single_re.py`	1 (vs statsmodels)	200
Two crossed intercepts	`test_parity_two_re.py`	2 (firm × dept)	2 000
Three crossed intercepts	`test_parity_three_re.py`	3 (firm × dept × region)	3 000

The oracle fixtures (tests/fixtures/*.json) are generated by R scripts that run lme4::lmer() and serialise the results. The Python tests load these and assert the following tolerances:

Quantity	Tolerance	How measured
Fixed effects	abs diff < 1e-4	per coefficient
Variance components	rel diff < 5%	per grouping factor + residual
BLUPs (conditional modes)	Pearson r > 0.99	per grouping factor
Conditional residuals	Pearson r > 0.999	across all observations

When adding a new feature that touches estimation, extend the relevant parity test or add a new fixture to keep these tolerances enforced.

Notebook execution policy¶

All Jupyter notebooks in docs/source/ are pre-executed: outputs are committed to git and Sphinx/MyST-NB renders them as static documents (no kernel required at build time).

Rationale: fit() uses BOBYQA optimisation and bootstrapping whose numerical outputs vary across environments and scipy versions. Run-on-build would require the full scipy/numpy/pandas/sparse stack in CI, slow builds significantly, and risk non-deterministic diffs. Pre-executed notebooks serve as stable documentation artifacts showing canonical outputs.

Trade-off: Notebooks must be re-executed manually before each release. This is a discipline requirement, not a technical one.

Release checklist addition: Before tagging a release, re-execute every notebook and commit the fresh outputs:

# From the repo root, with the dev environment active:
jupyter nbconvert --to notebook --execute --inplace docs/source/*.ipynb
git add docs/source/*.ipynb
git commit -m "docs: re-execute notebooks for vX.Y.Z"

Releasing a new version¶

Releases are fully automated once a version tag is pushed.

Steps:

Bump version in pyproject.toml

Commit the bump:

git commit -am "chore: bump to vX.Y.Z"
git push

Tag and push:
```
git tag vX.Y.Z
git push origin vX.Y.Z
```

The publish workflow triggers automatically on v* tags and:

Runs the full CI gate (lint + typecheck + tests)
Publishes the wheel and sdist to PyPI (silently skips if the version already exists)
Creates a GitHub Release with auto-generated release notes

Note: Do not push a tag without first bumping the version in pyproject.toml — PyPI will reject a duplicate version and the workflow will skip the publish step.