The autonomous AI scientist landscape
An autonomous AI scientist is a named software system that takes meaningful initiative across one or more of three primary stages: hypothesis generation (proposing novel, testable scientific claims rather than retrieving existing ones), experiment design (choosing experiments that discriminate between hypotheses, optimizing protocols), and analysis (interpreting experimental data, fitting models, drawing inferences). Closing the full loop — hypothesize, design, analyze — is the explicit ambition described in the Gao et al. Cell perspective (2024 ).
How the landscape breaks down
Across the 65 systems tracked here, a few cross-cutting patterns matter more than any single entry:
Chemistry and materials are the most loop-closed. These systems run genuine closed loops on physical hardware — CMU’s Coscientist and MARS drive robotic synthesis, AMASE and MAD couple multiple characterization instruments to active-learning policies, and LEAP and CatDT pair domain-tuned models with wet-lab or microkinetic validation. This is where “autonomous” most fully means hands-on-the-apparatus.
Biology and medicine carry the strongest evidence. Wet-lab and clinical validation — Co-Scientist’s in vitro oncology hits, Robin’s confirmed dry-AMD drug candidates, SPARK’s prospective pathology study across 18 cohorts, CRISPR-GPT’s non-expert gene-editing case study, OriGene’s agent-nominated cancer targets confirmed in patient-derived organoid models — set the highest evidence tier in the field.
ML and scientific computing is a large, fast-moving cluster, but mostly benchmark-validated rather than physically grounded. The recent design trend is architectural: self-improving systems that accumulate methodological memory across problems (GRAFT-ATHENA, EvoScientist, AutoSci, AutoScientists), and a turn toward verifiability — adversarial cross-model review (ARIS), evidence-chain auditing (ScientistOne), and numeric-registry gating (AutoResearchClaw).
Physical sciences and embodied systems are the newest frontier, moving AI past simulators onto real apparatus: Qumus fabricates 2D-material devices, the Qiushi engine runs a free-space optical platform, Dr.Sai operates inside the BESIII collider collaboration, and BioProVLA runs wet-lab robotics on an ~$800 rig.
A long tail of single-domain pioneers now spans mathematics (AI co-mathematician, Ax-Prover’s formal Lean proving), symbolic equation discovery (MCI), spatial data science (NORA), plant science (Aleks), and scientific visualization (VIS Co-Scientist).
Evaluation is shifting from one-off demos toward process-level scoring and verifiability audits , and several shared failure modes — reference hallucination, reproducibility, originality-versus-retrieval — remain open. See Evaluation and open problems .
All tracked systems
The full set is below, grouped by domain. Each row links to a per-system page with architecture, validation detail, headline metrics, and citation. Loop stage is the part of the scientific loop the system drives (Multi-stage = closes hypothesize → design → analyze); Validation is the strongest evidence tier it has demonstrated (Wet-lab > Mixed > Benchmark > Design-only).
Biology & medicine (17)
System
Org
What it is
Loop stage
Validation
Access
AgentPLM
Bedford College
Agentic protein language model that interleaves sequence generation with structure and docking tool calls.
Experiment design
Benchmark
Closed
BioProVLA-Agent
ECUST
Affordable embodied multi-agent system using Vision-Language-Action models for wet-lab manipulation.
Experiment design
Wet-lab
Unknown
Biomni
Stanford
General-purpose biomedical AI agent pairing a large tool environment with a code-executing planner.
Multi-stage
Benchmark
Open source
CRISPR-GPT
Stanford
Four-agent LLM planner for CRISPR-Cas gene-editing experiments across knockout, base, prime, and epigenetic editing.
Experiment design, Analysis
Wet-lab
Closed
Deep Research (BioAgents)
BioAgents
Open-source interactive multi-agent system for biomedical research, running in minutes per cycle.
Multi-stage
Benchmark
Open source
LabOS
Stanford / Princeton
AI-XR co-scientist pairing self-evolving agents with smart glasses to reason about and act in the physical lab.
Multi-stage
Mixed
Open source
Latent-Y
Latent Labs
Lab-validated autonomous agent that runs complete de novo antibody design campaigns from text prompts.
Multi-stage
Wet-lab
Closed
NeuroClaw
CUHK
Domain-specialized multi-agent assistant for reproducible neuroimaging research on raw MRI/fMRI/EEG data.
Experiment design, Analysis
Benchmark
Open source
OpenScientist
WashU
Open-source AI co-scientist for biomedical discovery, built on Claude Code with domain-specific Agent Skills.
Multi-stage
Wet-lab
Open source
OriGene
SJTU
Self-evolving multi-agent “virtual disease biologist” that nominates mechanism-grounded drug targets.
Hypothesis, Analysis
Wet-lab
Open source
PantheonOS
Stanford
Privacy-preserving multi-agent framework for single-cell and multi-omics analysis that evolves its own code.
Multi-stage
Wet-lab
Open source
PerTurboAgent
Genentech
Self-planning Genentech LLM agent that designs iterative Perturb-seq screens, round by round.
Experiment design, Analysis
Benchmark
Unknown
PharmaSwarm
UAB
Multi-agent LLM swarm for hypothesis-driven drug discovery with a central Evaluator ranking targets and compounds.
Hypothesis, Analysis
Design-only
Unknown
Robin (FutureHouse)
FutureHouse
Multi-agent system integrating hypothesis generation with data analysis in a lab-in-the-loop workflow.
Multi-stage
Wet-lab
Open source
SPARK
Cologne
System of pathology agents that turns biological concepts into tools applied to H&E and spatial-biology data.
Multi-stage
Wet-lab
Open source
Talk2QSP
Sanofi
Agent framework converting literature clinical scenarios into executable QSP/SBML model interventions.
Experiment design
Benchmark
Closed
The Virtual Biotech
Stanford
Multi-agent framework mirroring a biotech org, with a CSO agent delegating computational drug discovery to specialists.
Multi-stage
Benchmark
Unknown
Chemistry & materials (13)
System
Org
What it is
Loop stage
Validation
Access
AILA
IIT Delhi
Multi-agent LLM framework that autonomously operates an atomic force microscope, with the AFMBench suite.
Experiment design, Analysis
Wet-lab
Open source
AMASE
Maryland
Autonomous Materials Search Engine pairing robotic thin-film XRD with CALPHAD phase-diagram prediction.
Multi-stage
Wet-lab
Open source
AtomisticSkills
MIT
Open-source skill library that lets coding agents run atomistic research in materials, chemistry, and drug discovery.
Multi-stage
Benchmark
Open source
BORA
Liverpool
Language-based assistant coupling an LLM with Bayesian optimization for literature-grounded experiment design.
Experiment design, Hypothesis
Benchmark
Open source
CatDT
HKUST
Self-evolving multi-agent system that builds a condition-aware digital twin of a heterogeneous catalyst.
Multi-stage
Benchmark
Code on request
CategoryScienceClaw
MIT
Discovery framework adding a category-theoretic, proof-carrying layer to ScienceClaw for verifiable transitions.
Multi-stage
Benchmark
Open source
ChemCrow
EPFL
GPT-4-driven chemistry agent combining reasoning with expert tools for retrosynthesis and reaction execution.
Experiment design, Analysis
Wet-lab
Open source
Coscientist (CMU)
CMU
GPT-4 planner driving a robotic chemistry platform fully autonomously across the experimental loop.
Experiment design, Analysis
Wet-lab
Code on request
DKPL
Oak Ridge
Autonomous-microscopy framework that learns a utility from expert pairwise judgements to plan nanoscale experiments.
Experiment design, Analysis
Wet-lab
Unknown
LEAP
Renmin U
Expert-in-the-loop framework coupling a domain LLM with Bayesian optimization for perovskite additive discovery.
Multi-stage
Wet-lab
Unknown
MAD
Maryland
Closed-loop framework orchestrating multiple characterization instruments for thin-film materials discovery.
Experiment design, Analysis
Wet-lab
Code on request
MARS
SIAT
Hierarchical multi-agent and robotic framework that designs, synthesizes, and optimizes perovskites.
Multi-stage
Wet-lab
Open source
Qumus
Princeton
Embodied multi-agent system that fabricates 2D quantum materials and vdW device stacks in a robotic mini-lab.
Multi-stage
Wet-lab
Code on request
ML & scientific computing (14)
System
Org
What it is
Loop stage
Validation
Access
AI CFD Scientist
RPI
Open-source AI scientist for computational fluid dynamics, from ideation to solver runs to figure-grounded writing.
Multi-stage, Writing
Benchmark
Open source
AI Scientist (Sakana)
Sakana AI
End-to-end LLM agent that ideates, codes, runs ML experiments, and writes its own reviewed paper.
Multi-stage, Writing
Benchmark
Open source
AIRA (AIRA-Compose and AIRA-Design)
Meta FAIR
Pair of Meta FAIR LLM-agent frameworks that autonomously discover novel foundation-model architectures.
Multi-stage
Benchmark
Closed
ARIS
SJTU
Autonomous ML research harness using cross-model adversarial collaboration between executor and reviewer.
Multi-stage, Writing
Benchmark
Open source
ATLAS
DeepMind
DeepMind active-learning loop that designs experiments to discover interpretable mechanistic models of behavior.
Multi-stage
Benchmark
Unknown
AgenticSciML
Brown
Multi-agent LLM system that debates and evolutionarily refines scientific-machine-learning architectures.
Hypothesis, Experiment design
Benchmark
Code on request
AutoLLMResearch
Notre Dame
RL-trained agent that extrapolates from cheap LLM runs to configure expensive scalable experiments.
Experiment design, Analysis
Benchmark
Open source
AutoResearchClaw
UNC-Chapel Hill
Multi-agent ML research pipeline with structured debate, a self-healing executor, and human-in-the-loop modes.
Multi-stage, Writing
Benchmark
Open source
AutoTTS
Maryland
Agentic framework that autonomously discovers test-time-scaling controllers for LLM reasoning.
Hypothesis, Experiment design, Analysis
Benchmark
Open source
Deep Researcher Agent
U Tokyo
Open-source framework running LLM agents 24/7 through full deep-learning experiment cycles.
Multi-stage
Benchmark
Open source
GRAFT-ATHENA
Brown
Self-improving agentic framework that expands its own action space to discover numerical algorithms.
Multi-stage
Benchmark
Code on request
Jr. AI Scientist
U Tokyo
Autonomous AI scientist mimicking a novice student improving on a baseline paper through iterated experiments.
Multi-stage, Writing
Benchmark
Open source
MLEvolve
Shanghai AI Lab
Self-evolving multi-agent framework that discovers end-to-end ML solutions via Progressive Monte Carlo Graph Search.
Multi-stage
Benchmark
Open source
POISE
Fudan
Closed-loop Fudan framework that autonomously discovers policy-optimization algorithms for LLM reinforcement learning.
Multi-stage
Benchmark
Unknown
Physical sciences (5)
System
Org
What it is
Loop stage
Validation
Access
CMBEvolve and CosmoEvolve
Cambridge
Cambridge cosmology pair of agentic systems for code evolution and virtual-research-lab simulation.
Multi-stage
Benchmark
Code on request
CVEvolve
Argonne
Argonne agentic harness that discovers data-processing algorithms for experimental images via zero-code search.
Analysis
Benchmark
Unknown
DarkAgents
U. Bologna / INFN
Language-driven multi-agent system that takes a particle-physics model to a fit against astroparticle data, with explicit assumption and prior auditing.
Multi-stage
Benchmark
Open source
Dr.Sai
IHEP
Multi-agent LLM system for autonomous end-to-end high-energy-physics data analysis on the BESIII detector.
Analysis, Experiment design
Wet-lab
Unknown
Qiushi Discovery Engine
Zhejiang U
Agentic system performing end-to-end autonomous discovery on a real free-space optical platform.
Multi-stage
Wet-lab
Unknown
Math & symbolic (3)
System
Org
What it is
Loop stage
Validation
Access
AI co-mathematician
Google DeepMind
Google DeepMind agentic workbench for open-ended mathematics research with parallel ideation and proving.
Multi-stage, Writing
Mixed
Closed
Ax-Prover
Axiomatic AI
Multi-agent Lean theorem prover that gives general-purpose LLMs Lean tools to verify proofs across math and quantum physics.
Analysis
Benchmark
Open source
MCI
KRICT
Multi-agent framework blending symbolism and metaheuristics to discover explainable governing equations.
Hypothesis, Analysis
Benchmark
Unknown
General / multi-domain (10)
System
Org
What it is
Loop stage
Validation
Access
AutoSci
Peking U
Memory-centric agent that runs the full research lifecycle over persistent memory and evolves its own skills.
Multi-stage, Writing
Benchmark
Open source
AutoScientists
Harvard
Decentralized team of AI agents that self-organize around hypotheses and share results across teams.
Multi-stage
Benchmark
Open source
Co-Scientist (Google)
Google
Gemini-based multi-agent reasoning engine for biomedical hypothesis generation.
Hypothesis, Experiment design
Wet-lab
Closed
EOS AI agent
UNC-Chapel Hill
UNC AI agent that creates, runs, and analyzes lab protocols and optimization campaigns from text requests.
Experiment design, Analysis
Wet-lab
Unknown
EvoMaster
SJTU
Domain-agnostic harness for building self-evolving scientific agents; underpins the SciMaster ecosystem.
Multi-stage
Benchmark
Open source
EvoScientist
Huawei
Evolving multi-agent AI scientist whose agents share memories distilling past runs into reusable strategies.
Multi-stage
Benchmark
Open source
Kosmos
Edison Scientific
AI scientist automating data-driven discovery via long cycles of parallel analysis against a world model.
Multi-stage, Writing
Mixed
Closed
NovelSeek
Shanghai AI Lab
Closed-loop multi-agent framework evaluated across 12 AI-for-Science research tasks.
Multi-stage
Benchmark
Open source
SAGA
Cornell + multi
Bi-level agent that autonomously evolves the objective functions of a scientific design problem rather than treating them as fixed.
Multi-stage
Wet-lab
Open source
ScientistOne
Google
End-to-end autonomous research system maintaining verifiable evidence chains for every claim.
Multi-stage, Writing
Benchmark
Closed
Other (3)
System
Org
What it is
Loop stage
Validation
Access
Aleks
Cornell
Cornell multi-agent system that turns a plant-science question and dataset into interpretable ML models.
Multi-stage
Benchmark
Unknown
NORA
UT Knoxville
Multi-agent autonomous research system purpose-built for GIScience and spatial data science.
Multi-stage, Writing
Benchmark
Unknown
VIS Co-Scientist
LLNL
Agentic harness that autonomously designs custom visual-analysis apps from a dataset and task description.
Analysis
Benchmark
Code on request
Other systems being tracked for inclusion: Virtual Lab (Stanford / CZ Biohub, Nature 2025 — designed novel SARS-CoV-2 nanobodies), CORAL (multi-agent evolutionary discovery, arXiv:2604.01658), STORM , Aviary , and AutoBa .
Sources
Gottweis et al., “Accelerating scientific discovery with Co-Scientist,” Nature
Ghareeb et al., “A multi-agent system for automating scientific discovery” (Robin), Nature
Kazemeini et al., “Talk2QSP,” bioRxiv 2026.05.06.723244
Gao et al., “Empowering biomedical discovery with AI agents,” Cell 187 (2024)
Nature news, “Human scientists beat the best AI agents…” (AI Index Report 2026 coverage)
Boiko et al., “Autonomous chemical research with large language models,” Nature 624 (2023)
Bran et al., “Augmenting large language models with chemistry tools,” Nat. Mach. Intell. (2024)
Lu et al., “The AI Scientist,” arXiv:2408.06292 / Nature 651, 914–919 (2026)
Yamada et al., “The AI Scientist-v2,” arXiv:2504.08066
Huang et al., “Biomni: A General-Purpose Biomedical AI Agent,” bioRxiv 2025.05.30.656746
Qu et al., “CRISPR-GPT for agentic automation of gene-editing experiments,” Nat. Biomed. Eng. 10, 245–258 (2026)
Roberts et al., “OpenScientist: evaluating an open agentic AI co-scientist to accelerate biomedical discovery,” medRxiv 2026.03.15.26348338
NovelSeek Team, arXiv:2505.16938
Mitchener et al., “Kosmos: An AI Scientist for Autonomous Discovery,” arXiv:2511.02824
Jiang & Karniadakis, “AgenticSciML,” arXiv:2511.07262
Mandal et al., “Evaluating large language model agents for automation of atomic force microscopy” (AILA / AFMBench), Nat. Commun. 16:9104 (2025)
Shi et al., “Knowledge-driven autonomous materials research via collaborative multi-agent and robotic system” (MARS), Matter 9, 102577 (2026)
Shi et al., “Qumus: Realization of An Embodied AI Quantum Material Experimentalist,” arXiv:2605.18407
Yang et al., “End-to-end autonomous scientific discovery on a real optical platform” (Qiushi Discovery Engine), arXiv:2604.27092
He et al., “Dr.Sai: An agentic AI for real-world physics analysis at BESIII,” arXiv:2604.22541
Toscano et al., “GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms,” arXiv:2605.11117
Somasekharan et al., “AI CFD Scientist,” arXiv:2605.06607
Qu et al., “BiomniBench: Process-level Evaluation of LLM Agents for Real-world Biomedical Research,” bioRxiv 2026.05.12.724604
Trost, Zhang, Aring et al., “An agentic framework for autonomous scientific discovery in cancer pathology” (SPARK), Nature Medicine (2026)
Lyu et al., “EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery,” arXiv:2603.08127
Weidener et al., “Rethinking the AI Scientist: Interactive Multi-Agent Workflows for Scientific Discovery” (Deep Research / BioAgents), arXiv:2601.12542
Miyai et al., “Jr. AI Scientist and Its Risk Report,” TMLR 2026; arXiv:2511.04583
Yang, Li, Li, “ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration,” arXiv:2605.03042
Pepe et al., “Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design,” arXiv:2605.15871
Zheng et al., “LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling” (AutoTTS), arXiv:2605.08083
Xu & Borrett, “Beyond AI as Assistants: Toward Autonomous Discovery in Cosmology” (CMBEvolve / CosmoEvolve), arXiv:2605.14791
Zheng et al., “AI co-mathematician: Accelerating mathematicians with agentic AI,” arXiv:2605.06651
Liu et al., “AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration,” arXiv:2605.20025
Du et al., “CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing,” arXiv:2605.11359
Angelopoulos, Cahoon, Alterovitz, “From Prompts to Protocols: An AI Agent for Laboratory Automation” (EOS AI agent), arXiv:2605.16552
Zhang, “Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring,” arXiv:2604.05854
Xia, Zhang et al., “From AI Assistant to AI Scientist: Autonomous Discovery of LLM-RL Algorithms with LLM Agents (POISE),” arXiv:2603.23951
Song, Trotter, Chen, “LLM Agent Swarm for Hypothesis-Driven Drug Discovery” (PharmaSwarm), arXiv:2504.17967
Jin et al., “Aleks: AI powered Multi Agent System for Autonomous Scientific Discovery via Data-Driven Approaches in Plant Science,” arXiv:2508.19383
Hao, Lee, Wang, Scalia, Regev, “PerTurboAgent: An LLM-based Agent for Designing Iterative Perturb-Seq Experiments,” PMLR v311 / bioRxiv 2025.05.25.656020
Zhu, Cai, Liu et al., “EvoMaster: A Foundational Agent Framework for Building Evolving Autonomous Scientific Agents at Scale,” arXiv:2604.17406
Wang, He, Peng et al., “NeuroClaw Technical Report: Closed-Loop Agentic AI for Executable and Reproducible Neuroimaging Research,” arXiv:2604.24696
Latent Labs Team, “Latent-Y: A Lab-Validated Autonomous Agent for De Novo Drug Design,” arXiv:2603.29727
Xu, Poussi, Zhong et al. (Qiu group), “PantheonOS: An Evolvable Multi-Agent Framework for Automatic Genomics Discovery,” bioRxiv 2026.02.26.707870
Zhang, Eckmann, Miao, Mahon, Zou, “The Virtual Biotech: A Multi-Agent AI Framework for Therapeutic Discovery and Development,” bioRxiv 2026.02.23.707551
Liu et al., “AMASE: Autonomous Materials Search Engine for Closed-Loop Phase-Mapping of Sn-Bi Thin Films,” arXiv:2410.17430
Hickman et al., “BORA: A Language-Based Bayesian Optimization Research Assistant,” arXiv:2501.16224 / IJCAI 2025
Deng et al., “Harnessing AtomisticSkills for Agentic Atomistic Research,” arXiv:2605.24002
Meng et al., “ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence,” arXiv:2605.26340
Zhou et al., “NORA: A Harness-Engineered Autonomous Research Agent for Spatial Data Science,” arXiv:2605.02092
Na & Park, “Machine Collective Intelligence for Explainable Scientific Discovery” (MCI), arXiv:2604.27297
Du et al., “BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System for Biological Laboratory Manipulation,” arXiv:2605.07306
Wang, Chen, Gao et al., “LEAP: A closed-loop framework for perovskite precursor additive discovery,” arXiv:2605.20242
Gao, Fang, Zitnik, “AUTOSCIENTISTS: Self-Organizing Agent Teams for Long-Running Scientific Experimentation,” arXiv:2605.28655
Guo, Chawla, Wiest, Zhang, “AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration,” arXiv:2605.11518
Qian, Xu, Xie et al., “AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle,” arXiv:2605.31468
Miao, Li, Ai, Tang, Wang, Bremer, Liu, “Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks,” arXiv:2605.21825
Bulanadi, Baxter, Biswas et al., “Beyond Scalar Objectives: Expert-Feedback-Driven Autonomous Experimentation for Scientific Discovery at the Nanoscale” (DKPL), arXiv:2605.21820
Lee, Liang, Kim et al., “Real-time Multi-instrument Autonomous Discovery of Novel Phase-change Memory Materials” (MAD), arXiv:2605.18033
Song, Zhang, Cheng, “Autonomous heterogeneous catalyst discovery with a self-evolving multi-agent digital twin” (CatDT), arXiv:2606.05050
Wang & Buehler, “Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence” (CategoryScienceClaw), arXiv:2606.01444
Rahman & Rahman, “AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design,” arXiv:2606.02386 / ICML 2026
Du, Yu, Liu, Shen, Chen et al., “Accelerating Scientific Discovery with Autonomous Goal-evolving Agents” (SAGA), arXiv:2512.21782
Cong, Smerkous, Wang et al., “LabOS: The AI-XR Co-Scientist That Sees and Works With Humans,” arXiv:2510.14861
Lucente, Pascoli, Sala, Zandi, “DarkAgents: towards an agentic system for theoretical astroparticle physics,” arXiv:2606.11157
Éltető, Daw, Stachenfeld, Miller, “ATLAS: Active Theory Learning for Automated Science,” arXiv:2606.12386