Recipes updates

Reverse-chronological log of changes to the recipes cookbook. Newest at the top.

2026-06-14

Added

Screen a polypharmacy medication list for drug-drug interactions (Problem class: Knowledge synthesis; Evidence: Reported) — rung-2 DDInter skill recipe taking a medication list through per-drug ID resolution → pairwise DDInter queries → a cited severity/mechanism/management table with explicit “clean” lines, plus an optional rung-3 DailyMed + ClinPGx overlay on the major pairs. Drug Repurposing and Discovery focus-day recipe; cookbook’s first DDI-screening recipe. Reported — Domián et al., Explor. Res. Clin. Soc. Pharm. 2025 documents that ungrounded LLMs over-flag/hallucinate DDIs (Copilot 1,813 vs a 204-interaction reference on 57 real patients), establishing that screening must be anchored to a curated DDI database — the assembly this recipe recommends.
Run a GWAS on case-control genotype data (Problem class: Data analysis; Evidence: Proposed) — rung-2 PLINK2 skill recipe taking a PLINK/VCF genotype set through sample + variant QC (call rate, MAF, HWE-in-controls) → LD pruning → genotype PCA → PCA-adjusted logistic-regression --glm association with a lambda_GC inflation check, handing genome-wide-significant loci to the GWAS Catalog skill for annotation. Translational Medicine focus-day recipe; cookbook’s first GWAS recipe. Proposed — no documented LLM-driven PLINK2 assembly; grounded in Chang et al., GigaScience 4:7 (2015) and the canonical QC tutorial (Marees et al., Int. J. Methods Psychiatr. Res. 27:e1608 (2018)).

Build a pharmacogenomic dosing report from a patient’s diplotypes (Problem class: Knowledge synthesis; Evidence: Proposed) — rung-2 ClinPGx skill recipe taking star-allele diplotypes plus a medication list through diplotype→metabolizer-phenotype translation (CPIC PostgREST API) → per-drug CPIC/DPWG dosing recommendation lookup → a cited drug gene phenotype recommendation table, with explicit “no actionable guidance” flagging and a DDInter phenoconversion overlay noted. Translational Medicine focus-day recipe; cookbook’s first pharmacogenomic-dosing recipe, distinct from the germline-pathogenicity variant-interpretation recipe. Proposed — no documented LLM-driven ClinPGx/CPIC assembly; grounded in the CPIC guideline corpus (Amstutz et al., Clin. Pharmacol. Ther. 2018; Molden & Jukić, Front. Pharmacol. 2021).

Profile a cancer cohort’s genomics with cBioPortal (Problem class: Knowledge synthesis; Evidence: Reported) — rung-2 cBioPortal skill recipe taking a study + gene set through study/profile lookup → per-gene mutation+CNA alteration frequency and co-occurrence/mutual-exclusivity → TMB summary → a Kaplan-Meier overall-survival split by mutation status, with cohort-denominator caveats enforced. Translational Medicine focus-day recipe; cookbook’s first cohort-level cancer-genomics recipe, cross-linked to the gene-centric target dossier, single-variant variant-interpretation, and adjusted-modelling survival recipe. Reported — the cBioPortal-backed AI-HOPE conversational-agent family documents the assembly class (AI-HOPE-WNT, Front. Artif. Intell. 2025, recapitulating WNT-EOCRC survival p=0.0167/0.0007; AI-HOPE-TP53, Cancers 2025).

Verified (no changes)

Build a target dossier and Draft a Phase 2/3 clinical-trial protocol — linked catalog tools and key sources re-checked, last_verified bumped to 2026-06-14.
Assemble a tissue reference atlas from the CELLxGENE Census — linked catalog tools (cellxgene-census, scvi-tools, scanpy, anndata) and Census/scvi-hub sources re-checked, last_verified bumped to 2026-06-14.

2026-06-13

Added

Infer cell-cell communication from single-cell RNA-seq (Problem class: Data analysis; Evidence: Proposed) — rung-2 LIANA-MCP recipe taking an annotated AnnData object through ls_ccc_method → multi-method communicate (CellPhoneDB/Connectome/NATMI/SingleCellSignalR) → rank_aggregate consensus ligand-receptor tetrads → circle_plot/ccc_dotplot, consuming the annotated object from the scRNA-seq QC recipe. Molecular and Cellular Biology focus-day recipe; cookbook’s first cell-cell-communication recipe. Proposed — no documented LLM-driven LIANA-MCP assembly; grounded in Dimitrov et al., Nat. Commun. 13:3735 (2022), a 2026 consensus-LIANA application (Wei et al., PLOS ONE 2026), and the method-disagreement benchmark (Xie et al., Biomolecules 13:1211 (2023)).
Call peaks and find enriched motifs from ChIP-seq or ATAC-seq (Problem class: Data analysis; Evidence: Proposed) — rung-3 toolbelt chaining the MACS3 skill (callpeak, narrow/broad mode → narrowPeak BED) into the HOMER skill (annotatePeaks.pl nearest-gene context + findMotifsGenome.pl de-novo/known motif enrichment). Molecular and Cellular Biology focus-day recipe; the binding-site/motif companion to the deepTools signal-profiling recipe, which deliberately stops before peak calling. Proposed — no documented LLM-driven MACS3→HOMER assembly; grounded in the field-standard pipeline (Zhang et al., Genome Biol. 9:R137 (2008); Heinz et al., Mol. Cell 38:576 (2010)).
Analyze an existing MD trajectory for stability, flexibility, and contacts (Problem class: Data analysis; Evidence: Proposed) — rung-2 MDAnalysis skill recipe taking a finished GROMACS/AMBER/NAMD trajectory through a load-and-sanity-check → aligned RMSD/RMSF/Rg → interface contact map + H-bond occupancy → backbone PCA battery, with the MDTraj skill as the DSSP/Ramachandran fallback. Integrative Structural and Computational Biology focus-day recipe; the post-simulation-analysis companion to the GROMACS setup recipe. Proposed — no documented LLM-driven MDAnalysis-skill assembly; grounded in Michaud-Agrawal et al., J. Comput. Chem. 32:2319 (2011), McGibbon et al., Biophys. J. 109:1528 (2015), and class-level agentic-MD evidence (MDCrow, Mach. Learn. Sci. Technol. 2025).
Scan a therapeutic antibody for glycosylation sites (Problem class: Experimental design; Evidence: Proposed) — rung-2 Glycoengineering skill recipe taking heavy/light-chain sequences through N-X-S/T sequon detection (flagging Fc Asn-297 vs unintended variable-domain sites) → O-glycosylation hotspot prediction → a parent-vs-variant sequon diff, with optional minimal site-knockout edit suggestions. Immunology and Microbiology focus-day recipe; cookbook’s first antibody-developability / glycosylation recipe. Proposed — no documented LLM-driven glycoengineering-skill assembly; grounded in 2026 Fc-glycan/ADCC literature (Shuang et al., mAbs 2026; Illés 2026) and the galactosylation-as-CQA reference (Klingler et al., Biotechnol. Bioeng. 2024).
Compute a bacterial pan-genome from a set of genome assemblies (Problem class: Data analysis; Evidence: Proposed) — rung-3 toolbelt chaining the Bakta skill (identical per-genome annotation → GFF3) into the Roary skill (CD-HIT/BLAST/MCL clustering → core/soft-core/shell/cloud partition, gene_presence_absence.csv, and a core_gene_alignment.aln that feeds the phylogenetics recipe). Immunology and Microbiology focus-day recipe; cookbook’s first comparative-genomics / pan-genome recipe. Proposed — no documented LLM-driven Bakta→Roary assembly; grounded in the field-standard pipeline (Page et al., Bioinformatics 2015; Schwengers et al., Microb. Genom. 2021) and a 2025 27,884-genome application (Sholeh et al., Mol. Genet. Genomics 2025).

Verified (no changes)

35 recipes spot-checked; all last_verified dates within the 30-day window, no aging recipes due.

2026-06-11

Added

Profile ChIP-seq or ATAC-seq signal around genomic features (Problem class: Data analysis; Evidence: Proposed) — rung-2 deepTools skill recipe taking aligned ChIP-seq/ATAC-seq BAMs through bamCoverage BPM-normalized bigWig generation → multiBamSummary + plotCorrelation replicate QC → computeMatrix + plotHeatmap/plotProfile TSS/peak-centered visualization, with upstream BAM handling via the pysam skill. Molecular and Cellular Biology focus-day recipe; cookbook’s first ChIP-seq/ATAC-seq coverage-profiling recipe. Proposed — no documented LLM-driven deepTools workflow; grounded in Ramírez et al., NAR 44:W160 (2016) plus class-level Biomni.
Predict gene-knockout phenotypes with flux balance analysis (Problem class: Data analysis; Evidence: Proposed) — rung-2 COBRApy skill recipe taking a genome-scale SBML model through baseline FBA sanity-check → genome-wide single_gene_deletion essentiality ranking → focused double_gene_deletion synthetic-lethality screen, with an explicit growth-ratio essentiality threshold. Molecular and Cellular Biology focus-day recipe; cookbook’s first constraint-based metabolic-modelling recipe. Proposed — no documented LLM-driven COBRApy workflow; grounded in Ebrahim et al., BMC Syst. Biol. 7:74 (2013) and Orth et al., Nat. Biotechnol. 28:245 (2010), plus class-level Biomni.

Verified (no changes)

33 recipes spot-checked; all last_verified dates within the 30-day window, no aging recipes due.

2026-06-10

Added

Score point mutations for functional impact with a protein language model (Problem class: Data analysis; Evidence: Proposed) — rung-2 ESM skill recipe taking a wild-type protein sequence (optionally fetched by UniProt accession via the gget skill) and a list of substitutions through masked-marginal log-likelihood-ratio scoring → a ranked tolerated/deleterious CSV, with a wt-marginal one-pass variant for full single-mutation landscapes. Integrative Structural and Computational Biology focus-day recipe; cookbook’s first zero-shot variant-effect / protein-fitness recipe and the database-free complement to the clinical-variant interpretation recipe. Proposed — no documented LLM-driven ESM-skill scoring assembly; grounded in the canonical zero-shot method Meier et al., NeurIPS 2021, the ProteinGym benchmark, and 2025 directed-evolution use Zhang et al., Nat. Commun. 2025.

Verified (no changes)

31 recipes spot-checked; all last_verified dates within the 30-day window, no aging recipes due.

2026-06-09

Added

Build a phylogenetic tree from a set of sequences (Problem class: Data analysis; Evidence: Proposed) — rung-2 Phylogenetics skill recipe taking a FASTA of homologous sequences (viral genomes, microbial marker genes, protein families) through MAFFT --auto alignment → gap-column trimming → IQ-TREE 2 ModelFinder + ultrafast-bootstrap maximum-likelihood inference → midpoint/outgroup rooting → an ETE3-annotated tree figure, handing the Newick off to the ETE Toolkit and the 16S diversity recipe (which consumes the rooted tree for UniFrac). Immunology and Microbiology focus-day recipe; cookbook’s first phylogenetics / tree-building recipe. Proposed — no documented LLM-driven phylogenetics workflow; grounded in the field-standard tool references Katoh & Standley, MBE 30:772 (2013), Minh et al., MBE 37:1530 (2020), Kalyaanamoorthy et al., Nat. Methods 14:587 (2017), and Hoang et al., MBE 35:518 (2018), plus class-level Biomni.

Updated

Estimate pharmacokinetic properties of a small molecule — promoted Proposed → Reported on the first field report (issue #12). A user ran the full three-layer assembly through to a finished PK card and captured it in a standalone pk_card.py, verified across caffeine, ibuprofen, quercetin, and terfenadine. Added a Field reports subsection under Evidence and refreshed last_verified to 2026-06-09.

Verified (no changes)

3 recipes spot-checked (oldest last_verified first), all current; last_verified bumped to 2026-06-09: Scan approved drugs for repurposing candidates against a disease, Profile a compound’s polypharmacology from ChEMBL bioactivity data, Triage an AlphaFold model for structure-based drug design. All linked catalog pages resolve and are unflagged; source DOIs stable.

User requests

#12 @goodb — resolved. This entry had been stuck open since 2026-05-27 because the responder emitted no machine-readable trailer, so the request content lived only in the GitHub issue body — which the sandboxed curator agent (no gh/shell) could not read, leaving it “un-actionable” on every retry. Fixed at the source: the recipes.yml / curate.yml workflows now pre-fetch open user-request issue bodies into .request-bodies/<NN>.md before the agent runs, the responder fallback now rebuilds a structured queue entry from the issue-form fields, and RECIPE_AGENT.md / AGENT.md point the agent at the pre-fetched files instead of a gh issue view it can’t run.

2026-06-08

Added

Identify an unknown compound from an MS/MS spectrum (Problem class: Data analysis; Evidence: Proposed) — rung-2 matchms skill recipe taking experimental tandem-MS spectra plus a reference library (GNPS / MassBank / in-house .msp) through format import → peak cleaning and metadata harmonization → modified-cosine scoring with precursor-m/z gating → a ranked candidate-identity CSV, handing confirmed InChIKeys off to the PubChem MCP and the polypharmacology recipe. Chemistry focus-day recipe; cookbook’s first metabolomics / spectral-library-matching recipe. Proposed — no documented LLM-driven matchms workflow; grounded in the canonical library paper Huber et al., JOSS 5(52):2411 (2020) plus methodological anchors Onoprishvili et al., Bioinformatics (2025) (SimMS) and Xing et al., Anal. Chem. (2025) (enhanced reverse spectral search).

Verified (no changes)

Aging-recipe sweep: oldest last_verified is 2026-05-24 (15 days), within the 30-day window — no recipes due for re-verification this run.

User requests

#12 (@goodb) — still no gh permission to read the issue body from this run; left open for next-run retry.

2026-06-07

Added

Enumerate analogs around a lead compound for SAR expansion (Problem class: Hypothesis generation; Evidence: Proposed) — rung-2 Datamol skill recipe taking a lead SMILES through standardization → tautomer / stereoisomer enumeration → single-point fragment-substitution scan → ECFP4 Tanimoto + QED scoring → a deduplicated SAR-expansion CSV, with explicit handoff to the VS-hit-filtering developability gate and the polypharmacology bioactivity lookup. Drug Repurposing and Discovery focus-day recipe; cookbook’s first dedicated analog-enumeration / lead-optimisation recipe and the natural upstream of the existing hit-filtering recipe; cookbook’s second Hypothesis generation recipe. Proposed — no documented LLM-driven Datamol enumeration workflow; closest grounding is the K-Dense rdkit→datamol→medchem lead-optimisation workflow plus the underlying primitives Rogers & Hahn, JCIM 50:742 (2010) (ECFP/Tanimoto), Bickerton et al., Nat. Chem. 4:90 (2012) (QED), and Griffen et al., J. Med. Chem. 54:7739 (2011) (matched molecular pairs).

Updated

Nav orders rebalanced to keep alphabetical title ordering after the new addition. “Enumerate analogs…” inserted at 10; everything from “Estimate pharmacokinetic properties” downward shifted +1 (Estimate → 11, Filter VS hits → 12, Infer GRN → 13, Integrate single-cell → 14, Interpret variant → 15, Match patient → 16, Organize DICOM → 17, Parse FCS → 18, Prioritize targets → 19, Profile polypharmacology → 20, Run bulk RNA-seq → 21, Run first-pass QC → 22, Run functional enrichment → 23, Scan repurposing → 24, Set up MD → 25, Sort spikes → 26, Triage preprints → 27, Triage AlphaFold → 28, Fit survival → 29, Scan adverse events → 30).

Verified (no changes)

29 existing recipes spot-checked; none past the 30-day last_verified window (oldest is 2026-05-24, profile-compound-polypharmacology), so no re-verification was due this run.

2026-06-06

Added

Fit a survival model to censored clinical outcomes (Problem class: Data analysis; Evidence: Proposed) — rung-2 scikit-survival skill recipe taking a tidy covariate table plus a (time, event) outcome through structured-Surv encoding → Kaplan-Meier + log-rank → Cox PH (with a proportional-hazards check) → Random Survival Forest → cross-validated Harrell’s c-index → risk-group stratification. First Translational Medicine focus-day recipe of this run; cookbook’s first dedicated time-to-event / prognosis recipe. Proposed — no documented end-to-end LLM-driven sksurv workflow; closest grounding is the library reference Pölsterl, JMLR 21(212):1–6 (2020) and recent RSF-vs-nomogram prognosis studies Zhang et al., Transl. Cancer Res. (2026) and Liu et al., Medicine (2026).
Scan adverse-event reports for a drug-safety signal (Problem class: Knowledge synthesis; Evidence: Proposed) — rung-2 OpenFDA MCP recipe taking a drug name through generic-name resolution → FAERS top-reaction ranking → structured label / warning pull → label-vs-FAERS cross-check → an honest “reports, not rates” framing. Second Translational Medicine focus-day recipe of this run; promoted from the Deferred — next-run priority list; cookbook’s first pharmacovigilance recipe. Proposed — no documented attempt of this exact MCP assembly; openFDA/FAERS is the canonical public pharmacovigilance source and the server wraps it faithfully.

Verified (no changes)

27 existing recipes spot-checked; none past the 30-day last_verified window (oldest is 2026-05-24), so no re-verification was due this run.

2026-06-05

Added

Organize a raw DICOM dataset into a BIDS layout (Problem class: Workflow automation; Evidence: Proposed) — rung-2 BIDS Claude Skill recipe taking a directory of vendor DICOMs through series-level inventory → HeuDiConv heuristic (or dcm2bids config) drafting → single-subject --dry-run audit → cohort conversion via dcm2niix → top-level dataset_description.json / participants.tsv / sidecar authoring → bids-validator triage → PyBIDS post-conversion query, with explicit IntendedFor cross-link logic for fieldmaps. First Neuroscience focus-day recipe of this run; promoted from the Deferred — next-run priority list. Cookbook’s first imaging-side data-organization recipe — counterpart to the existing Discover NWB recordings on DANDI electrophysiology discovery recipe. Proposed because no documented end-to-end LLM-driven DICOM→BIDS workflow exists in last-24-months peer-reviewed or preprint literature; closest component-level grounding is Gorgolewski et al., Sci. Data 3:160044 (2016) and Poldrack et al., Imaging Neuroscience 2:1–19 (2024) (BIDS spec evolution); Yarkoni et al., JOSS 4(40):1294 (2019) (PyBIDS); Zwiers, Moia, Oostenveld, Front. Neuroinform. 15:770608 (2022) (BIDScoin); and Wulms et al., Sci. Data 10:673 (2023) (BIDSconvertR).

Updated

Nav orders rebalanced to keep alphabetical title ordering after the new addition and to fix a stale collision between Run first-pass QC and Run functional enrichment (both stamped 20). “Organize a raw DICOM dataset…” inserted at 16; everything from “Parse FCS…” downward shifted by +1, with Run first-pass QC at 21 and Run functional enrichment at 22: Parse FCS flow-cytometry files → 17, Prioritize targets → 18, Profile polypharmacology → 19, Run bulk RNA-seq DE → 20, Run first-pass QC → 21, Run functional enrichment → 22, Scan repurposing → 23, Set up protein MD → 24, Sort spikes → 25, Triage preprints → 26, Triage AlphaFold → 27.

Verified (no changes)

No aging recipes due — every last_verified date is within the 30-day window. The verification floor sits at 2026-05-24 (scan-drug-repurposing-candidates); next aging boundary is 2026-06-23.

User requests

#12 @goodb — still cannot access the issue body (no gh permission for the repo in this run); leaving open in recipes/curator-state.md for the next run with gh access.

2026-06-04

Added

Run functional enrichment on a gene list (Problem class: Data analysis; Evidence: Reported) — rung-2 gget skill recipe taking a list of gene symbols through gget enrichr against GO BP, KEGG, Reactome, MSigDB Hallmark, and DisGeNET → per-library CSV → grounded natural-language summary with explicit verification pass against the saved tables and a random-gene negative-control step. First Molecular and Cellular Biology focus-day recipe of this run; the cookbook’s first dedicated functional-enrichment / pathway-interpretation recipe and the natural downstream step after bulk RNA-seq DE. Reported evidence anchored in Wang et al., GeneAgent, Nature Methods 22:1677, 2025 — self-verification against Enrichr and curated databases lifts ROUGE-L on MSigDB from 0.239±0.038 (GPT-4) to 0.310±0.047 (GeneAgent) across 1,106 gene sets, with 84% of 15,848 claims database-supported and 92% of self-verification decisions correct on a 132-claim expert-judged sample; complementary anchors Hu et al., Nat. Methods 21:2353, 2024 and Joshi et al., llm2geneset (bioRxiv 2024-11-12).

Verified (no changes)

5 recipes spot-checked, last_verified bumped to 2026-06-04 — every linked catalog page resolves, every source URL still loads: Sort spikes from a Neuropixels recording end-to-end, Integrate multiple single-cell RNA-seq datasets across batches, Interpret a clinical variant from a natural-language query, Match a patient summary to recruiting clinical trials, Filter a virtual screening hit list with drug-likeness rules and structural alerts. Fixed one stale .md link → .html in the filter-virtual-screening recipe (RDKit-MCP cross-reference).

User requests

#12 @goodb — still cannot access the issue body (no gh permission in this run); leaving open in recipes/curator-state.md for the next run with gh access.

2026-06-03

Added

Dock a ligand library into a target structure with DiffDock (Problem class: Data analysis; Evidence: Proposed) — rung-2 DiffDock skill recipe taking a PDB or AlphaFold target + ligand SMILES CSV through batch-CSV prep → diffusion sampling (20–40 samples/complex) → confidence-thresholded filtering (> 0 trustworthy, −1.5–0 inspect, < −1.5 drop) → top-K SDF export, with explicit handoffs to MedChem / DeepChem / molecular-dynamics downstream. First Integrative Structural and Computational Biology focus-day recipe of this run; cookbook’s first dedicated docking recipe and natural downstream of the existing AlphaFold triage recipe. Proposed because no documented end-to-end LLM-orchestrated DiffDock virtual screen exists; closest component-level evidence is Corso et al., DiffDock-L (ICLR 2024, arXiv:2402.18396) (38%→80% RMSD<2Å on top one-third by confidence), Buttenschoen et al., PoseBusters (Chem. Sci. 15:3130, 2024), and Karelina et al., AF2-target docking (JCIM 63:6219, 2023) (~21% RMSD<2Å on AF2 models, motivating the upstream-triage gate in step 2).

Updated

Nav orders rebalanced to keep alphabetical title ordering after the new addition. “Dock a ligand library…” inserted at 8; everything from “Draft Phase 2/3…” downward shifted by +1: Draft Phase 2/3 clinical-trial protocol → 9, Estimate PK → 10, Filter virtual screening → 11, Infer GRN → 12, Integrate single-cell → 13, Interpret clinical variant → 14, Match patient to trials → 15, Parse FCS flow-cytometry files → 16, Prioritize targets → 17, Profile polypharmacology → 18, Run bulk RNA-seq DE → 19, Run first-pass QC → 20, Scan repurposing → 21, Set up protein MD → 22, Sort spikes → 23, Triage preprints → 24, Triage AlphaFold → 25.

Verified (no changes)

No aging recipes due — every last_verified date is within the 30-day window. The recipe set’s verification floor sits at 2026-05-22 (integrate-single-cell-datasets, sort-spikes-from-neuropixels-recording); next aging boundary is 2026-06-21.

User requests

#12 @goodb — still cannot access the issue body (no gh permission for the repo in this run); leaving the request open in recipes/curator-state.md for the next run with gh access.

2026-06-02

Added

Compute 16S microbiome alpha/beta diversity from a BIOM table (Problem class: Data analysis; Evidence: Proposed) — rung-2 scikit-bio skill recipe taking a BIOM feature table + sample metadata + Newick tree through rarefaction → Shannon/Simpson/Faith’s PD → weighted/unweighted UniFrac → PCoA → PERMANOVA with explicit grouping-column and permutation-count flags. First Immunology and Microbiology focus-day recipe of this run; cookbook’s first dedicated microbiome / community-ecology recipe. Proposed because no documented end-to-end attempt of this exact assembly exists; closest class-level evidence is Huang et al. Biomni (bioRxiv 2025.05.30.656746) whose published benchmark includes microbiome disease-taxa bioinformatics across five datasets (HMP, MetaPhlAn2 human metagenomics, drinking-water OTU matrices) at ~4× over base-LLM accuracy.
Parse FCS flow-cytometry files for downstream immunophenotyping (Problem class: Data analysis; Evidence: Proposed) — rung-2 FlowIO skill recipe taking a directory of vendor-emitted FCS 2.0/3.0/3.1 files through FlowData parsing → per-file metadata harvest → scatter/fluorescence/time channel categorisation → optional log/gain transforms → concatenated long-format events Parquet, with explicit failure surfacing for partial-acquisition files. Second Immunology and Microbiology focus-day recipe; cookbook’s first cytometry / FCS recipe. Proposed because no documented end-to-end attempt of this exact assembly exists; closest class-level evidence is “Enhancing Clinical Workflow Efficiency in Flow Cytometry Reporting with LLMs” (PMC13053331, J. Clin. Immunol. 2026), which demonstrates pathologist-level accuracy of fine-tuned LLMs on the downstream report-generation step the parsed-events output feeds into.

Updated

Nav orders rebalanced to keep alphabetical title ordering after the two additions: Assemble Census atlas → 1, Benchmark ADMET → 2, Build target dossier → 3, Compute 16S microbiome diversity → 4 (new), Compute HRV → 5, Convert instrument data → 6, Discover NWB on DANDI → 7, Draft Phase 2/3 clinical-trial protocol → 8, Estimate PK → 9, Filter virtual screening → 10, Infer GRN → 11, Integrate single-cell → 12, Interpret clinical variant → 13, Match patient to trials → 14, Parse FCS flow-cytometry files → 15 (new), Prioritize targets → 16, Profile polypharmacology → 17, Run bulk RNA-seq DE → 18, Run first-pass QC → 19, Scan repurposing → 20, Set up protein MD → 21, Sort spikes → 22, Triage preprints → 23, Triage AlphaFold → 24.

Verified (no changes)

No aging recipes due — every last_verified date is within the 30-day window. The recipe set’s verification floor sits at 2026-05-22 (integrate-single-cell-datasets, sort-spikes-from-neuropixels-recording); next aging boundary is 2026-06-21.

User requests

#12 (claude:recipe-feedback) — remains in ## User requests (open); gh CLI is still not available in this run’s environment so the issue body cannot be inspected. Retry next run with gh access.

2026-06-01

Added

Convert raw analytical instrument data to Allotrope ASM JSON (Problem class: Workflow automation; Evidence: Reported) — rung-2 instrument-data-to-allotrope skill recipe taking a vendor-format file (cell counter, plate reader, HPLC, MS, qPCR) through auto-detect → allotropy native parse → ASM JSON-LD + flattened CSV + exportable Python parser, with strict-validation of the raw-vs-derived split before LIMS / data-lake handoff. First Chemistry focus-day recipe of this run; cookbook’s first workflow-automation recipe spanning the Anthropic life-sciences plugin family. Anchored in the Claude for Life Sciences launch (October 2025), the Anthropic Vi-CELL tutorial, and the underlying Benchling-Open-Source/allotropy reference parser.
Set up a protein molecular dynamics simulation in GROMACS from a PDB ID (Problem class: Experimental design; Evidence: Proposed) — rung-2 molecule-mcp recipe driving the GROMACS Copilot server end-to-end (topology → solvation → ion neutralisation → minimisation → NVT/NPT → 50 ns production → RMSD/RMSF/Rg) with explicit force-field / water-model / GPU-offload flags. Second Chemistry focus-day recipe; first cookbook entry exercising the GROMACS path of the molecule-mcp bundle. Proposed because no documented end-to-end attempt of this exact assembly exists; closest peer-reviewed class-level evidence is MDCrow (Campbell et al., Mach. Learn. Sci. Technol. 2025, DOI:10.1088/2632-2153/ae4b07) — OpenMM rather than GROMACS but same architecture — plus GROMACS-supporting follow-ons DynaMate (arXiv:2512.10034) and NAMD-Agent (arXiv:2507.07887), and the MDGym benchmark (arXiv:2605.08941) as a reality check (Claude Code / Codex / OpenHands all solve <21% of easy GROMACS/LAMMPS tasks).

Updated

Nav orders rebalanced to restore strict alphabetical title ordering after the two additions and to correct two prior off-by-many drifts (Benchmark ADMET was at 20 instead of 2; Prioritize Targets was at 19 instead of 14): Assemble Census atlas → 1, Benchmark ADMET → 2, Build target dossier → 3, Compute HRV → 4, Convert instrument data → 5 (new), Discover NWB on DANDI → 6, Draft a Phase 2/3 clinical-trial protocol → 7, Estimate PK → 8, Filter virtual screening → 9, Infer GRN → 10, Integrate single-cell → 11, Interpret clinical variant → 12, Match patient to trials → 13, Prioritize targets → 14, Profile polypharmacology → 15, Run bulk RNA-seq DE → 16, QC single-cell → 17, Scan repurposing → 18, Set up protein MD in GROMACS → 19 (new), Sort spikes → 20, Triage preprints → 21, Triage AlphaFold → 22.
recipes/curator-state.md — ## Missing components entry for “DeepChem (K-Dense Skill)” removed; DeepChem is now catalogued at catalog/tools/deepchem.md.

Verified (no changes)

No aging recipes due — every last_verified date is within the 30-day window. The recipe set’s verification floor sits at 2026-05-22 (integrate-single-cell-datasets, sort-spikes-from-neuropixels-recording); next aging boundary is 2026-06-21.

User requests

#12 (claude:recipe-feedback) — remains in ## User requests (open); gh CLI is still not available in this run’s environment so the issue body cannot be inspected. Retry next run with gh access.

2026-05-31

Added

Prioritize targets within a disease via Open Targets (Problem class: Knowledge synthesis; Evidence: Reported) — rung-2 Open Targets plugin recipe taking a disease (EFO/MONDO) to a ranked target shortlist across the four prioritisation pillars (precedence, tractability, doability, safety) with cited GraphQL fields per cell. First DR&D focus-day recipe of this run; complements the existing gene-in Build a target dossier and disease-in/drug-out Scan approved drugs for repurposing candidates recipes. Anchored in Buniello et al. NAR 53(D1):D1467–D1475 (2025) and Minikel et al. Nature 629:624–629 (2024); closest LLM-driven application: Zunzunegui Sanz et al. bioRxiv 2025-06-13 and More et al. npj Precision Oncology 10:95 (2025).
Benchmark an ADMET property with PyTDC (Problem class: Data analysis; Evidence: Reported) — rung-2 PyTDC skill recipe driving the official TDC ADMET_Group benchmark (frozen scaffold splits, canonical metric per task, 5-seed leaderboard row format) so a new model gets a directly comparable number. Second DR&D focus-day recipe; first cookbook entry that produces leaderboard-comparable ADMET metrics. Anchored in Huang et al. NeurIPS Datasets and Benchmarks (2021), the published TDC-2 framework Velez-Arce et al. NeurIPS 2024, and recent LLM-driven workflows (Hao et al. Scientific Data 11:864 (2024); Yuan et al. arXiv:2406.06316 (2024)).

Verified (no changes)

No aging recipes due — every last_verified date is within the 30-day window. The recipe set’s verification floor sits at 2026-05-22 (integrate-single-cell-datasets, sort-spikes-from-neuropixels-recording); next aging boundary is 2026-06-21.

User requests

#12 (claude:recipe-feedback) — remains in ## User requests (open); gh CLI is still not available in this run’s environment so the issue body cannot be inspected. Retry next run with gh access.

2026-05-30

Added

Draft a Phase 2/3 clinical-trial protocol from an indication brief (Problem class: Manuscript prep; Evidence: Reported) — rung-2 clinical-trial-protocol Anthropic Healthcare plugin recipe that walks an indication / endpoint paragraph through the four-waypoint flow — regulatory classification, ClinicalTrials.gov competitive landscape, sample-size calculation, FDA/NIH-template drafting — emerging with a reviewable draft Phase 2/3 protocol scaffold. First Translational Medicine focus-day recipe of the new run; resolves a previously deferred candidate. Evidence anchored in the Anthropic plugin tutorial (Claude for Healthcare launch, January 2026) and class-level validation in Markey et al. Clinical Trials 2025 (80% content relevance, >99% terminology accuracy with RAG), Shin et al. Clinical Pharmacology & Therapeutics 2026 (100% accuracy on disease/intervention/comparator extraction, 14/15 trials for sample-size identification), Hauptman et al. JMIR Dermatology 2026, and Maleki, arXiv 2404.05044 (2024).

Updated

Nav orders rebalanced across the recipe set to keep alphabetical ordering after the addition: Assemble Census atlas → 1, Build target dossier → 2, Compute HRV → 3, Discover NWB on DANDI → 4, Draft a Phase 2/3 clinical-trial protocol → 5 (new), Estimate PK → 6, Filter virtual screening → 7, Infer GRN → 8, Integrate single-cell → 9, Interpret clinical variant → 10, Match patient to trials → 11, Profile polypharmacology → 12, Run bulk RNA-seq DE → 13, QC single-cell → 14, Scan repurposing → 15, Sort spikes → 16, Triage preprints → 17, Triage AlphaFold → 18.

Verified (no changes)

No aging recipes due — every last_verified date is within the 30-day window. The recipe set’s verification floor sits at 2026-05-22 (integrate-single-cell-datasets, sort-spikes-from-neuropixels-recording); next aging boundary is 2026-06-21.

User requests

#12 (claude:recipe-feedback) — remains in ## User requests (open); gh CLI is still not available in this run’s environment so the issue body cannot be inspected. Retry next run with gh access.

2026-05-29 (second pass — Neuroscience directed)

Added

Discover NWB recordings on DANDI and prepare them for sorting (Problem class: Knowledge synthesis; Evidence: Reported) — rung-3 Neurosift Tools MCP + neuropixels-analysis skill toolbelt taking a semantic query about extracellular recordings to a filtered list of DANDI assets — Claude calls dandi_semantic_search, dandi_search_by_neurodata_type, dandiset_assets, and nwb_file_info over the public DANDI API, applies user-supplied hypothesis constraints (probe model, session duration, presence of a Units table), and emits dandi download / pynwb streaming snippets ready for the Sort spikes from a Neuropixels recording recipe. Third Neuroscience-primary recipe; resolves a previously deferred candidate. Evidence anchored in Magland, Ly, Rübel, Dichter. Scientific Data 12:1988 (2025), doi:10.1038/s41597-025-06285-x, which documents an LLM-driven agentic chat assistant and notebook-generation pipeline for DANDI exploration from the same Flatiron lab that ships the Neurosift Tools MCP; reviewed by neurophysiology specialists with most generated notebooks rated “very helpful.” Canonical Neurosift citation: Magland, Soules, Baker, Dichter. JOSS 9(97):6590 (2024), doi:10.21105/joss.06590.

Updated

Nav orders rebalanced across the recipe set to keep alphabetical ordering after the addition: Assemble Census atlas → 1, Build target dossier → 2, Compute HRV → 3, Discover NWB on DANDI → 4, Estimate PK → 5, Filter virtual screening → 6, Infer GRN → 7, Integrate single-cell → 8, Interpret clinical variant → 9, Match patient to trials → 10, Profile polypharmacology → 11, Run bulk RNA-seq DE → 12, QC single-cell → 13, Scan repurposing → 14, Sort spikes → 15, Triage preprints → 16, Triage AlphaFold → 17.

Verified (no changes)

No aging recipes this run — every last_verified date is within the 30-day window. The recipe set’s verification floor sits at 2026-05-22 (integrate-single-cell-datasets, sort-spikes-from-neuropixels-recording); next aging boundary is 2026-06-21.

User requests

#12 (claude:recipe-feedback) — remains in ## User requests (open); gh CLI still unavailable in this run’s environment so the issue body cannot be inspected. Retry next run with gh access.

2026-05-29

Added

Compute HRV from an ECG recording (Problem class: Data analysis; Evidence: Proposed) — rung-2 NeuroKit2 Claude skill recipe taking a single-lead ECG to validated R-peaks plus time-domain, frequency-domain, and non-linear HRV indices, with nk.signal_quality-driven epoch exclusion. Second Neuroscience-primary recipe in the cookbook (joins the Neuropixels spike-sorting recipe). Component evidence: Makowski et al. Behavior Research Methods 2021 (NeuroKit2 reference) and Pham et al. Sensors 2021 (HRV indices tutorial). Closest LLM-orchestrated analogue: EEGAgent (Yan et al., arXiv:2511.09947, 2025-11-12), AAAI-26 — different signal modality and custom toolbox, not NeuroKit2.

Updated

Nav orders rebalanced across the recipe set to keep alphabetical ordering after the addition: Assemble Census atlas → 1, Build target dossier → 2, Compute HRV → 3, Estimate PK → 4, Filter virtual screening → 5, Infer GRN → 6, Integrate single-cell → 7, Interpret clinical variant → 8, Match patient to trials → 9, Profile polypharmacology → 10, Run bulk RNA-seq DE → 11, QC single-cell → 12, Scan repurposing → 13, Sort spikes → 14, Triage preprints → 15, Triage AlphaFold → 16.

Verified (no changes)

4 recipes spot-checked at the 30-day boundary and bumped to last_verified: 2026-05-29 — Triage preprints, QC single-cell, Build target dossier, Run bulk RNA-seq DE. All linked catalog tools (bio-research, pubmed, single-cell-rna-qc, pydeseq2, open-targets, uniprot, alphafold, depmap) remain present and unflagged.

User requests

#12 (claude:recipe-feedback) — remains in ## User requests (open); gh CLI is not available in this run’s environment so the issue body still cannot be inspected. Retry on the next run that has gh access.

2026-05-28

Added

Assemble a tissue reference atlas from the CELLxGENE Census (Problem class: Data analysis; Evidence: Reported) — rung-2 cellxgene-census skill recipe pulling a versioned AnnData slice from the CZ CELLxGENE Discover Census with the CZ-trained scVI embedding attached for reference mapping. First Molecular and Cellular Biology focus-day recipe to consume the Census. Evidence anchored in the Census team’s comp_bio_data_integration_scvi notebook, the scvi-hub paper (Ergen et al., Nature Methods 2025), and the integrated human lung atlas (Sikkema et al., Nature Medicine 2023).
Infer a gene-regulatory network from single-cell RNA-seq (Problem class: Data analysis; Evidence: Reported) — rung-2 Arboreto skill recipe running GRNBoost2 on a QC’d / integrated AnnData with a TF-restricted regressor and seed-stabilised reruns; produces the ranked TF–target edge table that pySCENIC consumes downstream. Evidence anchored in Moerman et al. Bioinformatics 2019 (GRNBoost2), Van de Sande et al. Nature Protocols 2020 (SCENIC workflow), and Bravo González-Blas et al. Nature Methods 2023 (SCENIC+).

Updated

Nav orders rebalanced across the recipe set to keep alphabetical ordering after the two additions: Assemble Census atlas → 1, Build target dossier → 2, Estimate PK → 3, Filter virtual screening → 4, Infer GRN → 5, Integrate single-cell → 6, Interpret clinical variant → 7, Match patient to trials → 8, Profile polypharmacology → 9, Run bulk RNA-seq DE → 10, QC single-cell → 11, Scan repurposing → 12, Sort spikes → 13, Triage preprints → 14, Triage AlphaFold → 15.

Missing components flagged to the catalog curator

pySCENIC wrapper (cisTarget + AUCell) — would unlock the full SCENIC pipeline downstream of the new GRN-inference recipe (motif filtering against cisTarget databases, per-cell regulon AUCell scoring).

Verified (no changes)

All 13 pre-existing recipes have last_verified within the 30-day window (oldest 2026-05-21); no aging verifications were due this run.

2026-05-27

Added

Estimate pharmacokinetic properties of a small molecule (Problem class: Knowledge synthesis; Evidence: Proposed) — rung-3 RDKit + MedChem + ChEMBL assembly producing a descriptor / rule-based / analog-anchored PK card for a single SMILES. Ships in response to user request #8. Closest documented analogues: ChemCrow (Bran et al., Nature Machine Intelligence 2024) and PharmaBench (Niu et al., Scientific Data 2024).
Triage an AlphaFold model for structure-based drug design (Problem class: Knowledge synthesis; Evidence: Proposed) — rung-2 AlphaFold MCP recipe producing a pLDDT-anchored go/refine/fall-back-to-PDB verdict on a UniProt accession. First Integrative Structural and Computational Biology-primary recipe. Evidence grounded in the EBI AlphaFold DB papers (Varadi 2022, Varadi 2024), the interface-pLDDT benchmark (Bryant 2022), and the AlphaFold-for-docking assessment (Karelina 2023).

Updated

Nav orders rebalanced across the recipe set to keep alphabetical ordering after the two additions: Estimate PK properties → 2, Filter virtual screening hits → 3, Integrate single-cell datasets → 4, Interpret clinical variant → 5, Match patient to trials → 6, Profile polypharmacology → 7, Run bulk RNA-seq DE → 8, QC single-cell RNA-seq → 9, Scan repurposing candidates → 10, Sort spikes → 11, Triage preprints → 12, Triage AlphaFold model → 13.

Missing components flagged to the catalog curator

ADMET-AI / AdmetLab 3.0 / Deep-PK wrapper — would let the new PK-properties recipe move from descriptor-and-analog estimation to defensible ML prediction for CYP / hERG / microsomal endpoints.
DeepChem (K-Dense Skill) — already flagged in the catalog curator’s state; would also strengthen the PK-properties recipe.
Co-folding / AlphaFold-Multimer / Boltz-2 wrapper — would unlock a complex-modelling companion to the AlphaFold triage recipe.

Verified (no changes)

All recipes have last_verified within the 30-day window; no aging verifications were due this run.

2026-05-25

Added

Filter a virtual screening hit list with drug-likeness rules and structural alerts (Problem class: Data analysis; Evidence: Reported) — rung-2 MedChem + Datamol cascade for Lipinski → Veber → PAINS → BRENK triage of SMILES hit lists. First Chemistry-primary recipe in the cookbook. Evidence anchored in the K-Dense lead-optimisation workflow and the foundational filter papers (Baell & Holloway PAINS 2010, Brenk 2008, Lipinski 2001, Veber 2002).
Profile a compound’s polypharmacology from ChEMBL bioactivity data (Problem class: Knowledge synthesis; Evidence: Reported) — rung-2 single-tool recipe over the ChEMBL connector. Second Chemistry-primary recipe and the compound-centric mirror of the existing target-dossier recipe. Evidence grounded in the Anthropic ChEMBL Connector tutorial and the ChEMBL curation paper (Mendez et al., NAR 2019).

Updated

Integrate multiple single-cell RNA-seq datasets across batches — nav_order 2 → 3 for alphabetical position after the new Filter recipe.
Interpret a clinical variant from a natural-language query — nav_order 3 → 4.
Match a patient summary to recruiting clinical trials — nav_order 4 → 5.
Run bulk RNA-seq differential expression from a counts matrix — nav_order 5 → 7 (after the new Profile recipe).
Run first-pass QC on a single-cell RNA-seq dataset — nav_order 6 → 8.
Scan approved drugs for repurposing candidates against a disease — nav_order 7 → 9.
Sort spikes from a Neuropixels recording end-to-end — nav_order 8 → 10.
Triage a stack of new preprints in your field — nav_order 9 → 11.

Verified (no changes)

9 existing recipes spot-checked; all last_verified dates within the 30-day window, all linked catalog pages resolve.

2026-05-24

Added

Scan approved drugs for repurposing candidates against a disease (Problem class: Knowledge synthesis; Evidence: Proposed) — rung-3 toolbelt composing the Open Targets plugin, ChEMBL connector, and DrugBank MCP; first focused Drug Repurposing and Discovery recipe in the cookbook. Evidence anchors: DeepDrug Alzheimer’s repurposing graph (Li et al., Scientific Reports 2025), Robin / ripasudil dAMD discovery (Ghareeb et al., Nature 2026), and DREBIOP LLM-validation benchmark (Zunzunegui Sanz et al., bioRxiv 2025-06-13).

Updated

Sort spikes from a Neuropixels recording end-to-end — nav_order 7 → 8 for alphabetical position.
Triage a stack of new preprints in your field — nav_order 8 → 9 for alphabetical position.

Verified (no changes)

8 existing recipes spot-checked; all last_verified dates within the 30-day window, all linked catalog pages resolve.

2026-05-23

Added

Match a patient summary to recruiting clinical trials (Problem class: Knowledge synthesis; Evidence: Reported) — rung-2 BioMCP / cyanheads-ClinicalTrials.gov-MCP recipe; first Translational-Medicine-focused recipe in the cookbook. Evidence grounded in TrialGPT (Jin et al., Nature Communications 2024, 87.3% criterion-matching accuracy).
Interpret a clinical variant from a natural-language query (Problem class: Knowledge synthesis; Evidence: Proposed) — rung-2 BioMCP recipe; pairs with the trial-matching recipe for variant-driven enrollment. Closest analogous benchmark is MARRVEL-MCP (bioRxiv 2025-11).

Updated

Run bulk RNA-seq differential expression from a counts matrix — nav_order 3 → 5 for alphabetical position after the two new TM recipes.
Run first-pass QC on a single-cell RNA-seq dataset — nav_order 4 → 6 for alphabetical position.
Sort spikes from a Neuropixels recording end-to-end — nav_order 5 → 7 for alphabetical position.
Triage a stack of new preprints in your field — nav_order 6 → 8 for alphabetical position.

Verified (no changes)

5 existing recipes spot-checked; all last_verified dates within the 30-day window, all linked catalog pages resolve.

2026-05-22

Added

Integrate multiple single-cell RNA-seq datasets across batches (Problem class: Data analysis; Evidence: Reported) — rung-2 recipe wrapping the Anthropic scvi-tools skill for scVI / scANVI batch integration; written in response to user request #7; evidence grounded in Hrovatin 2025 and scIB-E 2025 (source).
Sort spikes from a Neuropixels recording end-to-end (Problem class: Data analysis; Evidence: Reported) — rung-2 recipe wrapping the K-Dense neuropixels-analysis skill (SpikeInterface + Kilosort4); first Neuroscience-only recipe in the cookbook (source).

Updated

Run bulk RNA-seq differential expression from a counts matrix — nav_order shifted 2 → 3 for alphabetical position.
Run first-pass QC on a single-cell RNA-seq dataset — nav_order shifted 3 → 4 for alphabetical position.
Triage a stack of new preprints in your field — nav_order shifted 4 → 6 for alphabetical position.

Verified (no changes)

4 existing recipes spot-checked (all linked catalog pages resolve; last_verified 2026-05-21 still within the 30-day window so no bumps).

2026-05-21

Added

Run first-pass QC on a single-cell RNA-seq dataset (Problem class: Data analysis; Evidence: Reported) — rung-2 recipe wrapping Anthropic’s single-cell-rna-qc skill for canonical scverse MAD-based filtering of 10x .h5 / AnnData .h5ad inputs (source).
Run bulk RNA-seq differential expression from a counts matrix (Problem class: Data analysis; Evidence: Reported) — rung-2 recipe wrapping the K-Dense PyDESeq2 skill for negative-binomial GLM differential expression, including pseudobulk single-cell handoff guidance (source).
Build a target dossier from gene name to structure to cancer dependency (Problem class: Knowledge synthesis; Evidence: Proposed) — first rung-3 toolbelt recipe composing Open Targets, UniProt, AlphaFold, and DepMap into a one-page target dossier; first Proposed-evidence entry in the cookbook (closest analogue).

Updated

Triage a stack of new preprints in your field — nav_order shifted from 1 to 4 to reflect alphabetical ordering after the three new Mol/Cell Bio additions; no content changes.

Verified (no changes)

1 recipe spot-checked, current (triage-new-preprints, last_verified 2026-05-21).

2026-05-21 (initial seed)

Added

Section bootstrap — recipes/ section created with landing page, landscape page, and the all-recipes index; recipes/curator-state.md initialized; RECIPES_CHANGELOG.md (this file) created. Curator prompt and daily workflow added at RECIPE_AGENT.md and .github/workflows/recipes.yml.
Triage a stack of new preprints in your field (Problem class: Literature triage; Evidence: Reported) — first seed recipe demonstrating the schema and the lowest rung of the simplicity ladder (Claude Code alone + bioRxiv MCP) (source).