Contributing¶
Contributions are welcome — whether that’s bug reports, documentation improvements, or new features. To get started, open an issue on the GitHub issue tracker or submit a pull request.
Development setup¶
make install # create venv and install all dev deps via uv
make test # run pytest
make lint # ruff format + ruff check --fix
make typecheck # mypy
make check # lint + typecheck + test (full CI gate)
Workflow¶
This project follows strict TDD. Before writing any implementation code:
Pick an open issue from the issue tracker
Write a failing test in
tests/that captures the acceptance criteriaRun
make test— confirm it fails for the right reasonWrite the minimum implementation to make it pass
Run
make checkbefore submitting a pull request
Never write implementation code without a corresponding failing test driving it.
flowchart TD
A[Pick an open issue] --> B[Write failing test]
B --> C{make test\nfails?}
C -- No --> B
C -- Yes --> D[Write minimum implementation]
D --> E{make check\npasses?}
E -- No --> D
E -- Yes --> F[Open pull request]
Internal contributors use
bd(the beads issue tracker) to manage tasks:bd readylists unblocked issues,bd update <id> --claimclaims one.
Validation against lme4¶
Numerical parity with R’s lme4::lmer() is enforced by the test suite on every CI run.
Three scenarios are tested, each with a pre-computed oracle file generated from R:
Scenario |
Test file |
Groups |
Obs |
|---|---|---|---|
Single random intercept |
|
1 (vs statsmodels) |
200 |
Two crossed intercepts |
|
2 (firm × dept) |
2 000 |
Three crossed intercepts |
|
3 (firm × dept × region) |
3 000 |
The oracle fixtures (tests/fixtures/*.json) are generated by R scripts that run
lme4::lmer() and serialise the results. The Python tests load these and assert the
following tolerances:
Quantity |
Tolerance |
How measured |
|---|---|---|
Fixed effects |
abs diff < 1e-4 |
per coefficient |
Variance components |
rel diff < 5% |
per grouping factor + residual |
BLUPs (conditional modes) |
Pearson r > 0.99 |
per grouping factor |
Conditional residuals |
Pearson r > 0.999 |
across all observations |
When adding a new feature that touches estimation, extend the relevant parity test or add a new fixture to keep these tolerances enforced.
Notebook execution policy¶
All Jupyter notebooks in docs/source/ are pre-executed: outputs are committed to git and Sphinx/MyST-NB renders them as static documents (no kernel required at build time).
Rationale: fit() uses BOBYQA optimisation and bootstrapping whose numerical outputs vary across environments and scipy versions. Run-on-build would require the full scipy/numpy/pandas/sparse stack in CI, slow builds significantly, and risk non-deterministic diffs. Pre-executed notebooks serve as stable documentation artifacts showing canonical outputs.
Trade-off: Notebooks must be re-executed manually before each release. This is a discipline requirement, not a technical one.
Release checklist addition: Before tagging a release, re-execute every notebook and commit the fresh outputs:
# From the repo root, with the dev environment active:
jupyter nbconvert --to notebook --execute --inplace docs/source/*.ipynb
git add docs/source/*.ipynb
git commit -m "docs: re-execute notebooks for vX.Y.Z"
Releasing a new version¶
Releases are fully automated once a version tag is pushed.
Steps:
Bump
versioninpyproject.tomlCommit the bump:
git commit -am "chore: bump to vX.Y.Z" git push
Tag and push:
git tag vX.Y.Z git push origin vX.Y.Z
The publish workflow
triggers automatically on v* tags and:
Runs the full CI gate (lint + typecheck + tests)
Publishes the wheel and sdist to PyPI (silently skips if the version already exists)
Creates a GitHub Release with auto-generated release notes
Note: Do not push a tag without first bumping the version in
pyproject.toml— PyPI will reject a duplicate version and the workflow will skip the publish step.