📜 Paper → code mapping

← Home

This page maps every concept in arXiv:2605.10813 to the file (or files) that implement it. Numbers refer to sections / equations in the paper.

Paper symbol	Concept	Location
`𝒯`	User-specified research topic	`RunSnapshot.topic`
`𝒰`	User profile	`schemas.UserProfile`
`𝒮`	Skill Bank	`stores.SkillBank`
`ℳ`	Memory Module	`stores.MemoryStore`
`𝒪`	Orchestrator	`orchestrator.Orchestrator`
`π_θ`	Planner	`planner.Planner` (Qwen2.5-7B + LoRA)
`ℱ`	Free-form user feedback	`RunManager._wait_for_feedback`
`ℬ`	Experiment blueprint	`agents.Blueprint`
`𝒲`	Generated workspace / project	`agents.GeneratedProject`
`𝒜`	Analysis report	`agents.AnalysisReport`
`𝒫`	Final paper PDF	`agents.CompiledPaper`
`h*`	Selected hypothesis	`IdeationArtefacts.chosen_hypothesis_id`
`c_ℬ`	Reviewer critique on blueprint	`agents.BlueprintCritique`
`f_R`	Reviewer critique on paper	`agents.PaperCritique`

Equation 1 — Stage I Ideation retrieval

$\mathcal{S}_I, \mathcal{M}_I = \mathrm{Retrieve}(\mathcal{S}, \mathcal{M} \mid \mathcal{T}, \mathcal{U})$, $\quad P_I = \mathrm{Plan}(\mathcal{T}, \mathcal{U} \mid \mathcal{S}_I, \mathcal{M}_I)$

→ Orchestrator.retrieve + IdeationStage.run

Equation 2 — Stage I Planning retrieval

Same shape, conditioned on h* instead of 𝒯.

→ PlanningStage._initial_blueprint

Equation 3 — Peer-review correction loop

$\mathcal{B}^{(t+1)} = \mathrm{Refine}(\mathcal{B}^{(t)}, c_\mathcal{B}^{(t)}, P_P, E)$

→ PlanningStage._refine_blueprint — runs up to max_review_iterations (default 3).

Equation 4 — Skill/Memory distillation

$\mathcal{S}, \mathcal{M} \leftarrow \mathrm{Update}(\mathcal{S}, \mathcal{M} \mid h^*, \mathcal{B}, c_\mathcal{B})$

→ stores.distill called from Orchestrator.run_stage.

Equation 6 — Autonomous debug loop (Stage II)

$\mathcal{W}^{(t+1)} = \mathrm{Debug}(\mathcal{W}^{(t)} \mid \mathcal{S}_C, \mathcal{M}_C)$

→ CodingStage.run → _request_patch + _apply_patch. Capped at max_debug_iterations (default 3).

Equation 7 — Analysis report

$\mathcal{A} = \mathrm{Analyze}(R_{\mathrm{raw}}, \mathcal{B}, \mathcal{T})$

→ AnalysisStage. First tries to recover a RESULT_JSON: line printed by the generated project; falls back to LLM extraction.

Equation 10 — Paper revision loop (Stage III)

$\mathrm{Draft}^{(t+1)} = \mathrm{Revise}(\mathrm{Draft}^{(t)}, f_R^{(t)})$

→ WritingStage._revise_draft. Targets sections whose names appear in the reviewer’s issues for re-writes (keeps untouched sections stable).

Equations 14–15 — SDPO (planner training)

$\nabla_\theta \mathcal{L}{\mathrm{SDPO}} = -\mathbb{E}_y \left[ \sum_t \mathbb{E}{\hat{y}t} A_t^{\mathrm{SDPO}}(\hat{y}_t) \nabla\theta \log \pi_\theta(\hat{y}t \mid x, y{<t}) \right]$

$A_t^{\mathrm{SDPO}}(\hat{y}t) = \log \pi\theta(\hat{y}t \mid x, \mathcal{F}, y{<t}) - \log \pi_\theta(\hat{y}t \mid x, y{<t})$

→ planner.sdpo.sdpo_loss. Two forward passes (with vs. without feedback ℱ), stop-grad on teacher log-probs, advantage clipping at ±5, LoRA-only gradient flow.

See sdpo.html for the line-by-line derivation.

What this implementation doesn’t have (yet)

Paper concept	Status
Compliance / Novelty / Writing judges (§ 8–10)	⬜
20-topic benchmark harness (§ 4.2)	⬜
Simulated-scientist persona runner (§ 4.2.3)	⬜
Cross-round skill / memory growth tracking (Table 4)	⬜
Per-round efficiency / cost reporting (Table 3)	⬜
SLURM submission scripts	n/a (we run locally)
Figure-image generation via Gemini	n/a (we keep figures schematic)