π§ͺ Testing
β Home
Running the suite
# Default β offline only, fast (under 15 s).
pytest -m "not azure and not local_model"
# Opt-in: AAD smoke (needs `az login` + GPT-5.1 deployment).
pytest -m azure
# Opt-in: Qwen MPS smoke (needs ~15 GB model on disk + .[local] extras).
pytest -m local_model
Marker key
| Marker |
What it gates |
Default state |
| none |
Pure-Python, scripted-backend tests |
β
runs |
azure |
Hits Azure GPT-5.1 via AAD |
β skipped |
local_model |
Loads Qwen2.5-7B on MPS |
β skipped |
slow |
Anything >30 s |
β skipped unless requested |
Coverage map (61 tests)
| Module |
Tests |
config/ |
3 |
logging/ (run manifest) |
2 |
llm/router.py |
4 |
schemas/ + stores/retrieval.py |
7 |
stores/ (SkillBank, MemoryStore, ProfileStore) |
8 |
stores/distill.py |
4 |
orchestrator/ |
8 |
literature/client.py |
4 |
agents/stage1_* (Ideation + Planning) |
5 |
agents/stage2_* + stage3_* + sandbox.py + narrator |
9 |
api/ (HTTP routes) |
7 |
planner/sdpo.py |
3 (local-model opt-in) |
| Smoke |
2 (opt-in) |
Writing new tests
- Fakes over mocks. The
ScriptedBackend LLM (see tests/test_router.py)
is reused across the suite β it just pops pre-canned strings off a list.
- Stages: provide a fresh
tmp_path for ProfileStore so tests donβt
collide on disk; inject scripted LLMs through LLMRouter.
- API:
fastapi.testclient.TestClient works perfectly with our
in-process RunManager. Inject backends through create_app(router=..).
- SDPO: use the
hf-internal-testing/tiny-random-LlamaForCausalLM
fixture β a ~5 MB model that runs LoRA training in CPU in seconds.
Continuous integration
The repo is wired for GitHub Actions in
.github/workflows/ci.yml β it runs the default (offline) suite plus npm run typecheck && npm run build on every push.