Biomni
General-purpose biomedical AI agent coupling a 150-tool, 105-package, 59-database environment with a code-executing planner. Matches expert humans on LAB-Bench DbQA/SeqQA.
| Affiliation | Stanford University (Snap group, Leskovec lab), with Genentech, Arc Institute, Princeton, University of Washington, UCSF (Biomni) |
| First introduced | 2025-05 (bioRxiv preprint 2025.05.30.656746) |
| Lifecycle stages | Multi-stage (experiment design, analysis, hypothesis generation) |
| Autonomy level | Semi-autonomous (closed-loop with checkpoints; LLM-generated code is executed with system privileges by default) |
| Domain focus | Biomedicine (general purpose across 25 biomedical subfields) |
| Availability | Open source (Apache-2.0 for Biomni itself; integrated tools/databases may carry separate licenses); free no-code web UI at biomni.stanford.edu |
Approach
Two-component agent.
- Biomni-E1 is an environment mined from tens of thousands of biomedical publications, exposing 150 specialized tools, 105 software packages, and 59 databases as a unified action space.
- Biomni-A1 is an LLM agent that dynamically selects tools, plans tasks, and executes Python code that composes loops, parallelization, and conditional steps; an adaptive planning loop iteratively refines plans during execution.
Validation
On LAB-Bench: 74.4% accuracy on DbQA (vs. 74.7% human experts) and 81.9% on SeqQA (vs. 78.8% human experts). On a 52-question HLE subset spanning 14 biomedical subfields: 17.3% accuracy (reported as a 4× improvement over base LLMs).
Real-world case studies: a 10-step wearable-sensor pipeline analyzing 458 files and 227 nights of sleep data; multi-omics processing of >336,000 single-nucleus RNA-seq / ATAC-seq profiles.
Notable results
Matched or exceeded expert performance on LAB-Bench DbQA/SeqQA. Autonomously authored multi-stage bioinformatics pipelines with figures and reports across heterogeneous biomedical data types.
Primary paper
Huang et al., “Biomni: A General-Purpose Biomedical AI Agent,” bioRxiv 2025.05.30.656746.
Other references
- Biomni project page
- GitHub README and DETAILS.md
- Qu et al., “BiomniBench: Process-level Evaluation of LLM Agents for Real-world Biomedical Research,” bioRxiv 2026.05.12.724604 — sibling-lab process-level benchmark; task taxonomy derived from 32,014 queries on the Biomni platform. Best published configuration (Claude Code + Opus 4.7) scores 73.3/100; agent harness shifts scores more than model generation.