Skip to the content.

📜 Paper → code mapping

← Home

This page maps every concept in arXiv:2605.10813 to the file (or files) that implement it. Numbers refer to sections / equations in the paper.

Paper symbol Concept Location
𝒯 User-specified research topic RunSnapshot.topic
𝒰 User profile schemas.UserProfile
𝒮 Skill Bank stores.SkillBank
Memory Module stores.MemoryStore
𝒪 Orchestrator orchestrator.Orchestrator
π_θ Planner planner.Planner (Qwen2.5-7B + LoRA)
Free-form user feedback RunManager._wait_for_feedback
Experiment blueprint agents.Blueprint
𝒲 Generated workspace / project agents.GeneratedProject
𝒜 Analysis report agents.AnalysisReport
𝒫 Final paper PDF agents.CompiledPaper
h* Selected hypothesis IdeationArtefacts.chosen_hypothesis_id
c_ℬ Reviewer critique on blueprint agents.BlueprintCritique
f_R Reviewer critique on paper agents.PaperCritique

Equation 1 — Stage I Ideation retrieval

$\mathcal{S}_I, \mathcal{M}_I = \mathrm{Retrieve}(\mathcal{S}, \mathcal{M} \mid \mathcal{T}, \mathcal{U})$, $\quad P_I = \mathrm{Plan}(\mathcal{T}, \mathcal{U} \mid \mathcal{S}_I, \mathcal{M}_I)$

Orchestrator.retrieve + IdeationStage.run

Equation 2 — Stage I Planning retrieval

Same shape, conditioned on h* instead of 𝒯.

PlanningStage._initial_blueprint

Equation 3 — Peer-review correction loop

$\mathcal{B}^{(t+1)} = \mathrm{Refine}(\mathcal{B}^{(t)}, c_\mathcal{B}^{(t)}, P_P, E)$

PlanningStage._refine_blueprint — runs up to max_review_iterations (default 3).

Equation 4 — Skill/Memory distillation

$\mathcal{S}, \mathcal{M} \leftarrow \mathrm{Update}(\mathcal{S}, \mathcal{M} \mid h^*, \mathcal{B}, c_\mathcal{B})$

stores.distill called from Orchestrator.run_stage.

Equation 6 — Autonomous debug loop (Stage II)

$\mathcal{W}^{(t+1)} = \mathrm{Debug}(\mathcal{W}^{(t)} \mid \mathcal{S}_C, \mathcal{M}_C)$

CodingStage.run_request_patch + _apply_patch. Capped at max_debug_iterations (default 3).

Equation 7 — Analysis report

$\mathcal{A} = \mathrm{Analyze}(R_{\mathrm{raw}}, \mathcal{B}, \mathcal{T})$

AnalysisStage. First tries to recover a RESULT_JSON: line printed by the generated project; falls back to LLM extraction.

Equation 10 — Paper revision loop (Stage III)

$\mathrm{Draft}^{(t+1)} = \mathrm{Revise}(\mathrm{Draft}^{(t)}, f_R^{(t)})$

WritingStage._revise_draft. Targets sections whose names appear in the reviewer’s issues for re-writes (keeps untouched sections stable).

Equations 14–15 — SDPO (planner training)

$\nabla_\theta \mathcal{L}{\mathrm{SDPO}} = -\mathbb{E}_y \left[ \sum_t \mathbb{E}{\hat{y}t} A_t^{\mathrm{SDPO}}(\hat{y}_t) \nabla\theta \log \pi_\theta(\hat{y}t \mid x, y{<t}) \right]$

$A_t^{\mathrm{SDPO}}(\hat{y}t) = \log \pi\theta(\hat{y}t \mid x, \mathcal{F}, y{<t}) - \log \pi_\theta(\hat{y}t \mid x, y{<t})$

planner.sdpo.sdpo_loss. Two forward passes (with vs. without feedback ℱ), stop-grad on teacher log-probs, advantage clipping at ±5, LoRA-only gradient flow.

See sdpo.html for the line-by-line derivation.

What this implementation doesn’t have (yet)

Paper concept Status
Compliance / Novelty / Writing judges (§ 8–10)
20-topic benchmark harness (§ 4.2)
Simulated-scientist persona runner (§ 4.2.3)
Cross-round skill / memory growth tracking (Table 4)
Per-round efficiency / cost reporting (Table 3)
SLURM submission scripts n/a (we run locally)
Figure-image generation via Gemini n/a (we keep figures schematic)