Run functional enrichment on a gene list
Hand Claude Code a list of gene symbols (typically the significant hits from a DE analysis or a CRISPR screen); get back an Enrichr-backed enrichment table across GO Biological Process, KEGG, Reactome, and disease libraries, with a short natural-language synthesis of the dominant pathways and citations to the database accessions.
| Problem class | Data analysis |
| Subject areas | Molecular and Cellular Biology, Immunology and Microbiology, Drug Repurposing and Discovery |
| Evidence level | Reported |
| Complexity | One skill or MCP |
| Availability | Fully open |
| Compute | Laptop |
Problem
Functional enrichment is the canonical “so what does this gene list mean” step that follows almost every transcriptomics, proteomics, or screen experiment. The mechanics are unglamorous: hit Enrichr (or g:Profiler, or DAVID) with the gene symbols, sort the term tables by adjusted p-value, and translate the top hits into a paragraph a wet-lab collaborator can act on. The cost is the swivel-chairing — exporting the DE table, pasting symbols into a web form, copying tables back into the analysis notebook — and the interpretation pass at the end is where naïve LLM use most often goes wrong: vanilla GPT-4 will name plausible-sounding pathways for almost any gene list, including random ones. Solved looks like: paste a gene-symbol list and the species, get back a single Markdown report with the top enriched terms per library, each cited to its database accession, and a written summary anchored to those tabulated terms — not to the model’s prior.
Recommended approach
-
Install the gget Claude Skill. From the K-Dense marketplace:
/plugin marketplace add K-Dense-AI/claude-scientific-skills /plugin install gget@claude-scientific-skills pip install gget -
Hand Claude the gene list and the report contract. A minimal prompt:
I have 187 upregulated genes from a DE run (human, padj<0.05, log2FC>1). Use the gget skill to run gget.enrichr on this list against the following Enrichr libraries: - GO_Biological_Process_2023 - KEGG_2021_Human - Reactome_2022 - MSigDB_Hallmark_2020 - DisGeNET For each library, return the top 10 terms by adjusted p-value. Save each table to results/enrichment/<library>.csv. Then write results/enrichment/SUMMARY.md with one paragraph per library citing only terms that actually appear in the saved CSVs (term name + adjusted p-value + database accession). Gene list: <paste symbols, one per line>gget enrichrcalls the Enrichr REST API directly; no API key is needed for typical interactive use. -
Verify before believing. Ask Claude to cross-check every claim in
SUMMARY.mdagainst the saved CSVs:Re-read results/enrichment/SUMMARY.md. For every pathway or disease named, confirm it appears in the corresponding CSV with adjusted p-value < 0.05. Flag any sentence whose claim cannot be grounded in the tables and rewrite it.This explicit verification pass is the recipe’s main hallucination mitigation — it mirrors the design pattern GeneAgent benchmarks (see Evidence).
-
Run a negative-control gene set. Sample ~200 random protein-coding genes and re-run the same prompt. Any “enrichment” that survives BH correction on the random set is signalling background bias in the library (e.g., long-gene bias in GO) rather than biology. Worth doing once per project, not once per analysis.
-
For ranked enrichment (GSEA), drop to GSEApy.
gget enrichris over-representation analysis on an unranked hit list. If you have a full ranked DE table and want gene-set enrichment proper (running enrichment score, leading-edge analysis), call GSEApy from the same Claude session — it ships with Enrichr-compatible gene-set libraries viagp.prerank(). -
Hand off to downstream interpretation. The CSV tables are the audit trail. Pair with the build-target-dossier recipe when an enriched pathway points back to a specific target you want to characterise next.
Why this assembly
Rung 2 of the simplicity ladder. The gget skill wraps gget enrichr so Claude calls a single function with the right library names and gets back a pandas DataFrame; the model never invents a pathway because the API call is what populates the table. Plain Claude Code (rung 1) would either confabulate Enrichr results or write a one-off requests block against the Enrichr endpoint — fine occasionally but error-prone and not reproducible. A toolbelt (rung 3) buys nothing because the enrichment + interpretation flow is single-source. Rung 4 (autonomous systems) is the wrong tier for a step measured in seconds. The verification pass in step 3 is the explicit hallucination-mitigation pattern; it is the same self-grounding loop that GeneAgent (NIH/Nature Methods 2025) benchmarks at 84% claim support against 1,106 gene sets.
Availability
Fully open. The gget skill is OSS in K-Dense-AI/scientific-agent-skills; the underlying pachterlab/gget library is BSD-2-Clause; Enrichr is free for academic and non-profit use. No subscription, no institutional licence. Commercial users should review the Enrichr terms of service — the API is free to call but the underlying libraries have their own licences (some KEGG mirrors are commercial-restricted).
Compute requirements
Laptop. The whole workflow is HTTP requests against Enrichr; a 200-gene list across five libraries returns in 10–30 seconds end-to-end. No GPU. Disk is trivial (the CSV tables are a few hundred KB). Rate limits are generous for interactive use but if you batch over hundreds of gene sets, throttle to ~1 request/second to stay within Enrichr’s etiquette.
Evidence
Reported. The strongest reference for the assembly class is GeneAgent (Wang et al., Nature Methods 22:1677, 2025, DOI:10.1038/s41592-025-02748-6; PMID 40721871), a self-verification LLM agent that queries Enrichr and other curated databases to ground gene-set claims; across 1,106 gene sets it lifted ROUGE-L on MSigDB from 0.239 ± 0.038 (GPT-4 alone) to 0.310 ± 0.047 (GeneAgent), with 84% of 15,848 generated claims supported by database evidence and 92% of self-verification decisions judged correct by two human experts on a 132-claim sample. The recipe here is a smaller-grain composition (one skill, no autonomous loop) but the underlying database-grounding pattern is the same. Complementary anchors: Hu et al., Nature Methods 21:2353, 2024 — “Evaluation of large language models for discovery of gene set function” (DOI:10.1038/s41592-024-02525-x) shows GPT-4 names common gene-set functions with high specificity but only when grounded against a database; Joshi et al., llm2geneset preprint 2024-11 (DOI:10.1101/2024.11.11.621189) shows LLM-generated gene sets can be used as Enrichr-compatible inputs. No peer-reviewed benchmark of “Claude + gget skill + Enrichr” against hand-written gget enrichr code is known — the agent loop adds reproducibility and the verification pass, not new statistics.
Alternatives considered
- Plain Claude Code with the Enrichr REST API. Works for one-off analyses but the model has to re-derive the right
userListId⇒enrichflow and the right library shortnames each time, and is less likely to retain the per-library result tables in a reproducible form. Reach for it only when the gget skill isn’t installed and the analysis is throwaway. - GSEApy directly. GSEApy covers Enrichr over-representation, ranked GSEA, single-sample GSEA, and Biomart conversions in one Python package. It is the right answer when the analysis is ranked-list GSEA (step 5) or batched across hundreds of gene sets. It is not yet wrapped as a Claude skill in the catalog, so calls fall back to plain Claude + Python.
- g:Profiler / WebGestalt. Both are stronger than Enrichr on some libraries (g:Profiler has tighter multi-test correction; WebGestalt has better term-grouping). Neither has a Claude skill in the catalog; if you need them, drop to plain Claude + the respective REST APIs. Consider proposing skill wrappers if this becomes routine.
- GeneAgent directly.
ncbi-nlp/GeneAgentis publicly available; reach for it when the deliverable is gene-set naming with verified evidence claims (the use case the paper benchmarks) rather than a tabular enrichment report. It is not yet catalogued as a Claude skill. - An autonomous-science system (Biomni). Overkill for a single enrichment step. Biomni includes Enrichr-class tools alongside dozens of others; reach for it only when enrichment is one node in a larger autonomous loop.
See also
- gget (Claude Skill)
- Run bulk RNA-seq differential expression from a counts matrix — the upstream step that produces the gene list this recipe interprets.
- Infer a gene-regulatory network from single-cell RNA-seq — alternative downstream interpretation of a DE result.
- Build a target dossier from gene name to structure to cancer dependency — drill into one enriched gene at a time.
- Biomni — autonomous-system alternative.
Sources
- Wang Z. et al., “GeneAgent: self-verification language agent for gene-set analysis using domain databases,” Nature Methods 22:1677–1685, 2025 — published 2025-07; verified 2026-06-04 (this run).
- Hu M. et al., “Evaluation of large language models for discovery of gene set function,” Nature Methods 21:2353–2360, 2024 (DOI:10.1038/s41592-024-02525-x) — published 2024-12.
- Joshi M. et al., “llm2geneset: leveraging LLMs to dynamically generate gene sets,” bioRxiv 2024-11-12 (DOI:10.1101/2024.11.11.621189) — posted 2024-11-12.
- gget documentation —
gget enrichr— verified 2026-06-04 (this run). - Enrichr help and terms — verified 2026-06-04 (this run).
K-Dense-AI/scientific-agent-skills— gget skill — verified 2026-06-04 (this run).
Tried this recipe?
Share feedback — what worked, what didn’t, what you’d change. The form opens with this recipe pre-selected and a link back to this page.