PharmaSwarm

Multi-agent LLM swarm for hypothesis-driven drug discovery that orchestrates specialized agents over omics, knowledge-graph, and literature data, with a central Evaluator LLM ranking proposed targets and compounds by plausibility, novelty, in-silico efficacy, and safety.


Affiliation	Systems Pharmacology AI Research Center, University of Alabama at Birmingham (paper)
First introduced	2025-04 (arXiv:2504.17967, dated 2025-04-24)
Lifecycle stages	Hypothesis (target / compound proposals), Analysis (mechanistic simulation, scoring, ranking)
Autonomy level	Semi-autonomous — closed-loop iteration is automated, but the system is described as an AI copilot with human review at each cycle’s prioritized output
Domain focus	Drug discovery, including target identification, lead-compound suggestion, and repurposing
Availability	Unknown — paper describes the architecture and validation roadmap; no code release is referenced

Approach

PharmaSwarm is a three-layer architecture orchestrated via low-code workflow engines (n8n, Airflow, Prefect) or Kubernetes microservices (Argo Workflows / Kubeflow):

Data & Knowledge Layer. The getGPT module assembles GWAS variants, DEGs, and known drug targets from Open Targets, Open Targets Genetics, and GEO. ChEMBL, DrugBank, KEGG, Reactome, the PAGER API, and a proprietary PharmAlchemy knowledge graph supply chemical, pathway, and network context. GeneTerrain Knowledge Maps (GTKMs) render expression and interaction topography.
LLM Agent Swarm Layer. Three containerized agents access shared knowledge:
- Terrain2Drug — omics-driven discovery, projects seed gene lists onto GTKMs, identifies network hubs.
- Paper2Drug — LLM-templated literature mining for explicit and implicit target–compound pairs, validated by multi-hop traversals in PharmAlchemy.
- Market2Drug — ingests FDA bulletins, ClinicalTrials.gov updates, financial feeds, and social-media sentiment to surface repurposing candidates.
Validation & Evaluation Layer. A Pharmacological Efficacy and Toxicity Simulation (PETS) engine performs multiscale network propagation; an Interpretable Binding Affinity Map (iBAM) module cross-attends ESM2 protein embeddings and ChemBERTa molecular embeddings to produce affinity estimates and residue–substructure attention maps. A central TxGemma-based Evaluator scores proposals on empirical support, mechanistic coherence, novelty, safety, and interpretability, and sends structured feedback back to each agent for the next iteration.

A shared vector-database memory captures inter-agent context; agent submodels can be fine-tuned over time on accumulated validated insights.

Validation

The paper is a design + retrospective work and does not report wet-lab validation. A four-tier validation pipeline is proposed but not executed in this preprint:

Retrospective benchmarking on classic discovery cases (idiopathic pulmonary fibrosis, triple-negative breast cancer) measured by Recall@K, Precision@K, Kendall’s Tau, and MAP.
Prospective in-silico assessment with AutoDock Vina / Glide docking, 50–100 ns molecular dynamics, and ADMET prediction.
Experimental evaluation with SPR/ITC binding (Kd), cellular IC50 assays, kinase/receptor off-target panels, and rodent pilot studies.
Expert user studies measuring time-to-hypothesis and plausibility ratings versus conventional workflows.

An iBAM case study on the HSP90α–ligand complex is shown: predicted pKd 6.83 vs. experimental 6.05.

Notable results

Provides one of the first explicitly modular drug-discovery multi-agent designs combining omics analysis, knowledge-graph reasoning, market signals, and binding-affinity prediction under a single Evaluator-coordinated loop.
Distinct from existing biomedical agentic systems in the catalogue (Robin, Biomni, CRISPR-GPT, PerTurboAgent) by targeting target/compound hypothesis generation across heterogeneous biomedical data.
No wet-lab or prospective validation reported in this preprint — the four-tier pipeline is a roadmap.

Primary paper

Song, Trotter, Chen, “LLM Agent Swarm for Hypothesis-Driven Drug Discovery,” arXiv:2504.17967.

Other references

None yet.

Code

Not released at preprint time.