PyTDC (Claude Skill)
Claude skill that drives PyTDC, the Python client for Therapeutics Data Commons — a curated benchmark suite of drug-discovery ML datasets spanning ADMET prediction, drug-target interaction, drug-drug interaction, drug-response prediction, molecular generation, and retrosynthesis.
| Type | Claude Skill |
| Supplier | K-Dense Inc. (community OSS); PyTDC by Harvard mims-harvard/TDC |
| Availability | GA — actively maintained 2025–2026 |
| Pricing | Free / OSS skill (MIT collection); PyTDC itself is MIT-licensed; TDC datasets follow per-dataset licenses |
| Capabilities | Read/Write — Claude executes PyTDC via Python/Bash to load datasets, run benchmarks, and call generation oracles |
How to install
- Also packaged in the SciAgent-Skills collection (jaechang-hits (community OSS, CC BY 4.0)): clone
jaechang-hits/SciAgent-Skillsand run/plugin install sciagent-skillsin Claude Code (or copyskills/structural-biology-drug-discovery/pytdc-therapeutics-data-commonsinto~/.claude/skills/). - Claude Code / Claude.ai — Skills CLI (recommended):
npx skills add K-Dense-AI/scientific-agent-skillsInstalls the K-Dense collection; enable the
pytdcskill when prompted (also works in Cursor/Codex via the Agent Skills spec; requires Node ≥ 18). - Claude Code / Claude Desktop — manual clone:
git clone https://github.com/K-Dense-AI/scientific-agent-skills cp -r scientific-agent-skills/skills/pytdc ~/.claude/skills/ pip install pytdc
Project-scoped alternative: copy into .claude/skills/ instead of ~/.claude/skills/.
What it does
SKILL.md with recipes for:
- Single-instance prediction (
single_pred) — ADMET, toxicity, quantum properties, paratope, epitope - Multi-instance prediction (
multi_pred) — drug-target interaction (DTI), drug-drug interaction (DDI), GDA, drug response, PPI - Generation (
generation) — de-novo molecular generation, retrosynthesis, reaction yield, paired generation - Bundled helper scripts:
load_and_split_data.py,benchmark_evaluation.py,molecular_generation.py - Reference docs:
datasets.md,oracles.md(17+ molecule-generation oracles incl. QED, SA, DRD2, GSK3B, JNK3),utilities.md - Standard data splits (random, scaffold, cold-start), leaderboard metrics, and TDC benchmark suites (
ADMET_Group,DTI_DG_Group,Drug_Response_Group)
Primary use cases: Benchmarking ML models for drug discovery, ADMET property prediction, drug-target interaction screening, molecular generation with property oracles, retrosynthesis evaluation.
Notes
Pairs with the deepchem, medchem, datamol, rdkit-skill, and molfeat entries — PyTDC supplies labelled benchmark splits while those skills supply featurizers and models. Some TDC datasets auto-download on first use (a few GB total across the full suite); allow disk space and network access. Skill is documentation plus Python recipes — Claude calls PyTDC locally via Bash/Python.
Sources
K-Dense-AI/scientific-agent-skillsskills/pytdc/SKILL.md- Therapeutics Data Commons
mims-harvard/TDC- Huang et al. NeurIPS Datasets and Benchmarks 2021
Installed this tool?
Share feedback — install path, OS, errors, workarounds. The form opens with this tool pre-selected and a link back to this page.