The autonomous AI scientist landscape

An autonomous AI scientist is a named software system that takes meaningful initiative across one or more of three primary stages: hypothesis generation (proposing novel, testable scientific claims rather than retrieving existing ones), experiment design (choosing experiments that discriminate between hypotheses, optimizing protocols), and analysis (interpreting experimental data, fitting models, drawing inferences). Closing the full loop — hypothesize, design, analyze — is the explicit ambition described in the Gao et al. Cell perspective (2024).

How the landscape breaks down

Across the 65 systems tracked here, a few cross-cutting patterns matter more than any single entry:

Chemistry and materials are the most loop-closed. These systems run genuine closed loops on physical hardware — CMU’s Coscientist and MARS drive robotic synthesis, AMASE and MAD couple multiple characterization instruments to active-learning policies, and LEAP and CatDT pair domain-tuned models with wet-lab or microkinetic validation. This is where “autonomous” most fully means hands-on-the-apparatus.
Biology and medicine carry the strongest evidence. Wet-lab and clinical validation — Co-Scientist’s in vitro oncology hits, Robin’s confirmed dry-AMD drug candidates, SPARK’s prospective pathology study across 18 cohorts, CRISPR-GPT’s non-expert gene-editing case study, OriGene’s agent-nominated cancer targets confirmed in patient-derived organoid models — set the highest evidence tier in the field.
ML and scientific computing is a large, fast-moving cluster, but mostly benchmark-validated rather than physically grounded. The recent design trend is architectural: self-improving systems that accumulate methodological memory across problems (GRAFT-ATHENA, EvoScientist, AutoSci, AutoScientists), and a turn toward verifiability — adversarial cross-model review (ARIS), evidence-chain auditing (ScientistOne), and numeric-registry gating (AutoResearchClaw).
Physical sciences and embodied systems are the newest frontier, moving AI past simulators onto real apparatus: Qumus fabricates 2D-material devices, the Qiushi engine runs a free-space optical platform, Dr.Sai operates inside the BESIII collider collaboration, and BioProVLA runs wet-lab robotics on an ~$800 rig.
A long tail of single-domain pioneers now spans mathematics (AI co-mathematician, Ax-Prover’s formal Lean proving), symbolic equation discovery (MCI), spatial data science (NORA), plant science (Aleks), and scientific visualization (VIS Co-Scientist).

Evaluation is shifting from one-off demos toward process-level scoring and verifiability audits, and several shared failure modes — reference hallucination, reproducibility, originality-versus-retrieval — remain open. See Evaluation and open problems.

All tracked systems

The full set is below, grouped by domain. Each row links to a per-system page with architecture, validation detail, headline metrics, and citation. Loop stage is the part of the scientific loop the system drives (Multi-stage = closes hypothesize → design → analyze); Validation is the strongest evidence tier it has demonstrated (Wet-lab > Mixed > Benchmark > Design-only).

Biology & medicine (17)

System	Org	What it is	Loop stage	Validation	Access
AgentPLM	Bedford College	Agentic protein language model that interleaves sequence generation with structure and docking tool calls.	Experiment design	Benchmark	Closed
BioProVLA-Agent	ECUST	Affordable embodied multi-agent system using Vision-Language-Action models for wet-lab manipulation.	Experiment design	Wet-lab	Unknown
Biomni	Stanford	General-purpose biomedical AI agent pairing a large tool environment with a code-executing planner.	Multi-stage	Benchmark	Open source
CRISPR-GPT	Stanford	Four-agent LLM planner for CRISPR-Cas gene-editing experiments across knockout, base, prime, and epigenetic editing.	Experiment design, Analysis	Wet-lab	Closed
Deep Research (BioAgents)	BioAgents	Open-source interactive multi-agent system for biomedical research, running in minutes per cycle.	Multi-stage	Benchmark	Open source
LabOS	Stanford / Princeton	AI-XR co-scientist pairing self-evolving agents with smart glasses to reason about and act in the physical lab.	Multi-stage	Mixed	Open source
Latent-Y	Latent Labs	Lab-validated autonomous agent that runs complete de novo antibody design campaigns from text prompts.	Multi-stage	Wet-lab	Closed
NeuroClaw	CUHK	Domain-specialized multi-agent assistant for reproducible neuroimaging research on raw MRI/fMRI/EEG data.	Experiment design, Analysis	Benchmark	Open source
OpenScientist	WashU	Open-source AI co-scientist for biomedical discovery, built on Claude Code with domain-specific Agent Skills.	Multi-stage	Wet-lab	Open source
OriGene	SJTU	Self-evolving multi-agent “virtual disease biologist” that nominates mechanism-grounded drug targets.	Hypothesis, Analysis	Wet-lab	Open source
PantheonOS	Stanford	Privacy-preserving multi-agent framework for single-cell and multi-omics analysis that evolves its own code.	Multi-stage	Wet-lab	Open source
PerTurboAgent	Genentech	Self-planning Genentech LLM agent that designs iterative Perturb-seq screens, round by round.	Experiment design, Analysis	Benchmark	Unknown
PharmaSwarm	UAB	Multi-agent LLM swarm for hypothesis-driven drug discovery with a central Evaluator ranking targets and compounds.	Hypothesis, Analysis	Design-only	Unknown
Robin (FutureHouse)	FutureHouse	Multi-agent system integrating hypothesis generation with data analysis in a lab-in-the-loop workflow.	Multi-stage	Wet-lab	Open source
SPARK	Cologne	System of pathology agents that turns biological concepts into tools applied to H&E and spatial-biology data.	Multi-stage	Wet-lab	Open source
Talk2QSP	Sanofi	Agent framework converting literature clinical scenarios into executable QSP/SBML model interventions.	Experiment design	Benchmark	Closed
The Virtual Biotech	Stanford	Multi-agent framework mirroring a biotech org, with a CSO agent delegating computational drug discovery to specialists.	Multi-stage	Benchmark	Unknown

Chemistry & materials (13)

System	Org	What it is	Loop stage	Validation	Access
AILA	IIT Delhi	Multi-agent LLM framework that autonomously operates an atomic force microscope, with the AFMBench suite.	Experiment design, Analysis	Wet-lab	Open source
AMASE	Maryland	Autonomous Materials Search Engine pairing robotic thin-film XRD with CALPHAD phase-diagram prediction.	Multi-stage	Wet-lab	Open source
AtomisticSkills	MIT	Open-source skill library that lets coding agents run atomistic research in materials, chemistry, and drug discovery.	Multi-stage	Benchmark	Open source
BORA	Liverpool	Language-based assistant coupling an LLM with Bayesian optimization for literature-grounded experiment design.	Experiment design, Hypothesis	Benchmark	Open source
CatDT	HKUST	Self-evolving multi-agent system that builds a condition-aware digital twin of a heterogeneous catalyst.	Multi-stage	Benchmark	Code on request
CategoryScienceClaw	MIT	Discovery framework adding a category-theoretic, proof-carrying layer to ScienceClaw for verifiable transitions.	Multi-stage	Benchmark	Open source
ChemCrow	EPFL	GPT-4-driven chemistry agent combining reasoning with expert tools for retrosynthesis and reaction execution.	Experiment design, Analysis	Wet-lab	Open source
Coscientist (CMU)	CMU	GPT-4 planner driving a robotic chemistry platform fully autonomously across the experimental loop.	Experiment design, Analysis	Wet-lab	Code on request
DKPL	Oak Ridge	Autonomous-microscopy framework that learns a utility from expert pairwise judgements to plan nanoscale experiments.	Experiment design, Analysis	Wet-lab	Unknown
LEAP	Renmin U	Expert-in-the-loop framework coupling a domain LLM with Bayesian optimization for perovskite additive discovery.	Multi-stage	Wet-lab	Unknown
MAD	Maryland	Closed-loop framework orchestrating multiple characterization instruments for thin-film materials discovery.	Experiment design, Analysis	Wet-lab	Code on request
MARS	SIAT	Hierarchical multi-agent and robotic framework that designs, synthesizes, and optimizes perovskites.	Multi-stage	Wet-lab	Open source
Qumus	Princeton	Embodied multi-agent system that fabricates 2D quantum materials and vdW device stacks in a robotic mini-lab.	Multi-stage	Wet-lab	Code on request

ML & scientific computing (14)

System	Org	What it is	Loop stage	Validation	Access
AI CFD Scientist	RPI	Open-source AI scientist for computational fluid dynamics, from ideation to solver runs to figure-grounded writing.	Multi-stage, Writing	Benchmark	Open source
AI Scientist (Sakana)	Sakana AI	End-to-end LLM agent that ideates, codes, runs ML experiments, and writes its own reviewed paper.	Multi-stage, Writing	Benchmark	Open source
AIRA (AIRA-Compose and AIRA-Design)	Meta FAIR	Pair of Meta FAIR LLM-agent frameworks that autonomously discover novel foundation-model architectures.	Multi-stage	Benchmark	Closed
ARIS	SJTU	Autonomous ML research harness using cross-model adversarial collaboration between executor and reviewer.	Multi-stage, Writing	Benchmark	Open source
ATLAS	DeepMind	DeepMind active-learning loop that designs experiments to discover interpretable mechanistic models of behavior.	Multi-stage	Benchmark	Unknown
AgenticSciML	Brown	Multi-agent LLM system that debates and evolutionarily refines scientific-machine-learning architectures.	Hypothesis, Experiment design	Benchmark	Code on request
AutoLLMResearch	Notre Dame	RL-trained agent that extrapolates from cheap LLM runs to configure expensive scalable experiments.	Experiment design, Analysis	Benchmark	Open source
AutoResearchClaw	UNC-Chapel Hill	Multi-agent ML research pipeline with structured debate, a self-healing executor, and human-in-the-loop modes.	Multi-stage, Writing	Benchmark	Open source
AutoTTS	Maryland	Agentic framework that autonomously discovers test-time-scaling controllers for LLM reasoning.	Hypothesis, Experiment design, Analysis	Benchmark	Open source
Deep Researcher Agent	U Tokyo	Open-source framework running LLM agents 24/7 through full deep-learning experiment cycles.	Multi-stage	Benchmark	Open source
GRAFT-ATHENA	Brown	Self-improving agentic framework that expands its own action space to discover numerical algorithms.	Multi-stage	Benchmark	Code on request
Jr. AI Scientist	U Tokyo	Autonomous AI scientist mimicking a novice student improving on a baseline paper through iterated experiments.	Multi-stage, Writing	Benchmark	Open source
MLEvolve	Shanghai AI Lab	Self-evolving multi-agent framework that discovers end-to-end ML solutions via Progressive Monte Carlo Graph Search.	Multi-stage	Benchmark	Open source
POISE	Fudan	Closed-loop Fudan framework that autonomously discovers policy-optimization algorithms for LLM reinforcement learning.	Multi-stage	Benchmark	Unknown

Physical sciences (5)

System	Org	What it is	Loop stage	Validation	Access
CMBEvolve and CosmoEvolve	Cambridge	Cambridge cosmology pair of agentic systems for code evolution and virtual-research-lab simulation.	Multi-stage	Benchmark	Code on request
CVEvolve	Argonne	Argonne agentic harness that discovers data-processing algorithms for experimental images via zero-code search.	Analysis	Benchmark	Unknown
DarkAgents	U. Bologna / INFN	Language-driven multi-agent system that takes a particle-physics model to a fit against astroparticle data, with explicit assumption and prior auditing.	Multi-stage	Benchmark	Open source
Dr.Sai	IHEP	Multi-agent LLM system for autonomous end-to-end high-energy-physics data analysis on the BESIII detector.	Analysis, Experiment design	Wet-lab	Unknown
Qiushi Discovery Engine	Zhejiang U	Agentic system performing end-to-end autonomous discovery on a real free-space optical platform.	Multi-stage	Wet-lab	Unknown

Math & symbolic (3)

System	Org	What it is	Loop stage	Validation	Access
AI co-mathematician	Google DeepMind	Google DeepMind agentic workbench for open-ended mathematics research with parallel ideation and proving.	Multi-stage, Writing	Mixed	Closed
Ax-Prover	Axiomatic AI	Multi-agent Lean theorem prover that gives general-purpose LLMs Lean tools to verify proofs across math and quantum physics.	Analysis	Benchmark	Open source
MCI	KRICT	Multi-agent framework blending symbolism and metaheuristics to discover explainable governing equations.	Hypothesis, Analysis	Benchmark	Unknown

General / multi-domain (10)

System	Org	What it is	Loop stage	Validation	Access
AutoSci	Peking U	Memory-centric agent that runs the full research lifecycle over persistent memory and evolves its own skills.	Multi-stage, Writing	Benchmark	Open source
AutoScientists	Harvard	Decentralized team of AI agents that self-organize around hypotheses and share results across teams.	Multi-stage	Benchmark	Open source
Co-Scientist (Google)	Google	Gemini-based multi-agent reasoning engine for biomedical hypothesis generation.	Hypothesis, Experiment design	Wet-lab	Closed
EOS AI agent	UNC-Chapel Hill	UNC AI agent that creates, runs, and analyzes lab protocols and optimization campaigns from text requests.	Experiment design, Analysis	Wet-lab	Unknown
EvoMaster	SJTU	Domain-agnostic harness for building self-evolving scientific agents; underpins the SciMaster ecosystem.	Multi-stage	Benchmark	Open source
EvoScientist	Huawei	Evolving multi-agent AI scientist whose agents share memories distilling past runs into reusable strategies.	Multi-stage	Benchmark	Open source
Kosmos	Edison Scientific	AI scientist automating data-driven discovery via long cycles of parallel analysis against a world model.	Multi-stage, Writing	Mixed	Closed
NovelSeek	Shanghai AI Lab	Closed-loop multi-agent framework evaluated across 12 AI-for-Science research tasks.	Multi-stage	Benchmark	Open source
SAGA	Cornell + multi	Bi-level agent that autonomously evolves the objective functions of a scientific design problem rather than treating them as fixed.	Multi-stage	Wet-lab	Open source
ScientistOne	Google	End-to-end autonomous research system maintaining verifiable evidence chains for every claim.	Multi-stage, Writing	Benchmark	Closed

Other (3)

System	Org	What it is	Loop stage	Validation	Access
Aleks	Cornell	Cornell multi-agent system that turns a plant-science question and dataset into interpretable ML models.	Multi-stage	Benchmark	Unknown
NORA	UT Knoxville	Multi-agent autonomous research system purpose-built for GIScience and spatial data science.	Multi-stage, Writing	Benchmark	Unknown
VIS Co-Scientist	LLNL	Agentic harness that autonomously designs custom visual-analysis apps from a dataset and task description.	Analysis	Benchmark	Code on request

Other systems being tracked for inclusion: Virtual Lab (Stanford / CZ Biohub, Nature 2025 — designed novel SARS-CoV-2 nanobodies), CORAL (multi-agent evolutionary discovery, arXiv:2604.01658), STORM, Aviary, and AutoBa.