AI co-mathematician
Google DeepMind agentic workbench for open-ended mathematics research that orchestrates a project-coordinator agent and parallel workstreams across ideation, literature search, computational exploration, theorem proving, and theory building, with stateful tracking of failed hypotheses.
| Affiliation | Google DeepMind (paper) |
| First introduced | 2026-05 (arXiv:2605.06651, v2 2026-05-13) |
| Lifecycle stages | Multi-stage (ideation → computational exploration → proof → theory building), with native authored “working paper” artifact |
| Autonomy level | Semi-autonomous (interactive, asynchronous; user can intervene at any time, agents request human help when stalled) |
| Domain focus | Mathematics research |
| Availability | Closed — described as a “limited initial release” at preprint time |
Approach
A workbench layered on top of Gemini language models. A top-level project-coordinator agent opens an interactive onboarding dialogue with the user to refine the research question and approve high-level goals before any work is delegated. Per-goal workstream coordinators run linear sequences of actions and dispatch specialized sub-agents (including Gemini Deep Think) for ideation, literature search, computational exploration, and proof attempts; agents communicate asynchronously over an internal messaging system and write to a shared workspace file system. The system centers its outputs on a living “working paper” rather than chat logs, with inline highlights and margin notes that record provenance, contentious claims, and stalled reasoning. Uncertainty is treated as a first-class state to be managed (version history, continuous review loops, systematic citation checking); failed explorations are preserved as durable first-class outcomes rather than discarded. The harness is designed to be model-agnostic and to host external engines such as AlphaEvolve, AlphaProof, and Aletheia inside its interactive loop.
Validation
Two complementary regimes. (1) Qualitative early-access deployment with practicing mathematicians, who used the system to steer open-ended research on problems including upper bounds for variants of the moving-sofa problem; the paper reports examples of solving open problems, surfacing new research directions, and uncovering overlooked literature. (2) Independent benchmark evaluation on FrontierMath: Epoch AI administered Tier 4 to the system in final-answer mode.
Notable results
- 48% on FrontierMath Tier 4 (Epoch AI evaluation), reported in the paper as a new high score among AI systems evaluated on this tier.
- Documented walkthrough of programmatic constraints and adversarial review loops that prevent the system from taking the easy path on intractable problems while running multi-day parallel workstreams.
Primary paper
Zheng et al., “AI co-mathematician: Accelerating mathematicians with agentic AI,” arXiv:2605.06651.
Other references
None yet.
Code
Not released — the paper describes the system as subject to a “limited initial release,” with broader productization stated as a future goal.