Jr. AI Scientist
Autonomous AI scientist that mimics a novice student’s workflow — analyzing a baseline paper’s limitations, formulating improvement hypotheses, iterating experiments via modern coding agents, and writing the resulting paper.
| Affiliation | The University of Tokyo (Aizawa lab) and Tokyo University of Science (Miyai, Toyooka, Otonari, Zhao, Aizawa) |
| First introduced | 2026-02 (published in TMLR; arXiv:2511.04583) |
| Lifecycle stages | Multi-stage (limitation analysis → idea generation → multi-stage experimentation → manuscript creation), plus Writing |
| Autonomy level | Fully autonomous within a baseline-paper-anchored workflow (human mentor supplies the baseline paper) |
| Domain focus | Machine learning research, evaluated on NeurIPS, IJCV, and ICLR baseline papers |
| Availability | Open source (github.com/Agent4Science-UTokyo/Jr.AI-Scientist) |
Approach
The system takes a baseline paper (PDF + LaTeX source + codebase) from a human mentor and executes a four-phase workflow: Baseline Resources ingestion and limitation analysis; Idea Generation with novelty checks; three-stage Experimentation (implement the proposed method, improve it, then run ablation studies with LLM Review and feedback reflection); and Paper Write-Up (manuscript creation, review reflection, page adjustment). The architecture leans on modern coding agents to handle complex, multi-file implementations — explicitly addressing the prior limitation of AI-scientist systems being restricted to small-scale code experiments.
Validation
Evaluated on three real baseline papers — a NeurIPS 2023 paper, an IJCV 2025 paper, and an ICLR 2025 paper — under three regimes: (1) automated DeepReviewer assessment comparing Jr. AI Scientist’s outputs to other AI-generated papers; (2) author-led evaluation of papers built on the authors’ own prior work; and (3) submission to the Agents4Science conference, a venue dedicated to AI-driven scientific contributions. The companion risk report comprehensively catalogues failure modes encountered during development.
Notable results
- Generated papers receive higher DeepReviewer scores than existing fully automated AI-scientist systems.
- Successfully built on real NeurIPS / IJCV / ICLR papers by proposing and implementing novel methods (rather than only running small-scale toy experiments).
- The published risk inventory documents concrete failure modes — useful prior art for any group deploying autoresearch systems.
Primary paper
Miyai, Toyooka, Otonari, Zhao, Aizawa, “Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper,” TMLR (2026) (arXiv:2511.04583).
Other references
None yet.