📜 Paper → code mapping
This page maps every concept in arXiv:2605.10813 to the file (or files) that implement it. Numbers refer to sections / equations in the paper.
| Paper symbol | Concept | Location |
|---|---|---|
𝒯 |
User-specified research topic | RunSnapshot.topic |
𝒰 |
User profile | schemas.UserProfile |
𝒮 |
Skill Bank | stores.SkillBank |
ℳ |
Memory Module | stores.MemoryStore |
𝒪 |
Orchestrator | orchestrator.Orchestrator |
π_θ |
Planner | planner.Planner (Qwen2.5-7B + LoRA) |
ℱ |
Free-form user feedback | RunManager._wait_for_feedback |
ℬ |
Experiment blueprint | agents.Blueprint |
𝒲 |
Generated workspace / project | agents.GeneratedProject |
𝒜 |
Analysis report | agents.AnalysisReport |
𝒫 |
Final paper PDF | agents.CompiledPaper |
h* |
Selected hypothesis | IdeationArtefacts.chosen_hypothesis_id |
c_ℬ |
Reviewer critique on blueprint | agents.BlueprintCritique |
f_R |
Reviewer critique on paper | agents.PaperCritique |
Equation 1 — Stage I Ideation retrieval
$\mathcal{S}_I, \mathcal{M}_I = \mathrm{Retrieve}(\mathcal{S}, \mathcal{M} \mid \mathcal{T}, \mathcal{U})$, $\quad P_I = \mathrm{Plan}(\mathcal{T}, \mathcal{U} \mid \mathcal{S}_I, \mathcal{M}_I)$
→ Orchestrator.retrieve +
IdeationStage.run
Equation 2 — Stage I Planning retrieval
Same shape, conditioned on h* instead of 𝒯.
→ PlanningStage._initial_blueprint
Equation 3 — Peer-review correction loop
$\mathcal{B}^{(t+1)} = \mathrm{Refine}(\mathcal{B}^{(t)}, c_\mathcal{B}^{(t)}, P_P, E)$
→ PlanningStage._refine_blueprint — runs up to max_review_iterations (default 3).
Equation 4 — Skill/Memory distillation
$\mathcal{S}, \mathcal{M} \leftarrow \mathrm{Update}(\mathcal{S}, \mathcal{M} \mid h^*, \mathcal{B}, c_\mathcal{B})$
→ stores.distill called from
Orchestrator.run_stage.
Equation 6 — Autonomous debug loop (Stage II)
$\mathcal{W}^{(t+1)} = \mathrm{Debug}(\mathcal{W}^{(t)} \mid \mathcal{S}_C, \mathcal{M}_C)$
→ CodingStage.run → _request_patch + _apply_patch. Capped at max_debug_iterations (default 3).
Equation 7 — Analysis report
$\mathcal{A} = \mathrm{Analyze}(R_{\mathrm{raw}}, \mathcal{B}, \mathcal{T})$
→ AnalysisStage. First tries to recover a RESULT_JSON: line printed by the generated project; falls back to LLM extraction.
Equation 10 — Paper revision loop (Stage III)
$\mathrm{Draft}^{(t+1)} = \mathrm{Revise}(\mathrm{Draft}^{(t)}, f_R^{(t)})$
→ WritingStage._revise_draft. Targets sections whose names appear in the reviewer’s issues for re-writes (keeps untouched sections stable).
Equations 14–15 — SDPO (planner training)
$\nabla_\theta \mathcal{L}{\mathrm{SDPO}} = -\mathbb{E}_y \left[ \sum_t \mathbb{E}{\hat{y}t} A_t^{\mathrm{SDPO}}(\hat{y}_t) \nabla\theta \log \pi_\theta(\hat{y}t \mid x, y{<t}) \right]$
$A_t^{\mathrm{SDPO}}(\hat{y}t) = \log \pi\theta(\hat{y}t \mid x, \mathcal{F}, y{<t}) - \log \pi_\theta(\hat{y}t \mid x, y{<t})$
→ planner.sdpo.sdpo_loss. Two forward passes (with vs. without feedback ℱ), stop-grad on teacher log-probs, advantage clipping at ±5, LoRA-only gradient flow.
See sdpo.html for the line-by-line derivation.
What this implementation doesn’t have (yet)
| Paper concept | Status |
|---|---|
| Compliance / Novelty / Writing judges (§ 8–10) | ⬜ |
| 20-topic benchmark harness (§ 4.2) | ⬜ |
| Simulated-scientist persona runner (§ 4.2.3) | ⬜ |
| Cross-round skill / memory growth tracking (Table 4) | ⬜ |
| Per-round efficiency / cost reporting (Table 3) | ⬜ |
| SLURM submission scripts | n/a (we run locally) |
| Figure-image generation via Gemini | n/a (we keep figures schematic) |