Analyze an existing MD trajectory for stability, flexibility, and contacts

You already have a trajectory — from GROMACS, AMBER, NAMD, CHARMM, or a collaborator — and need the standard post-simulation readout: did it equilibrate, which residues move, what contacts persist, and what the dominant motions are. This recipe drives the MDAnalysis skill to do all of it from one prompt.

   
Problem class Data analysis
Subject areas Chemistry, Integrative Structural and Computational Biology
Evidence level Proposed
Complexity One skill or MCP
Availability Fully open
Compute Workstation with GPU

Problem

The simulation finished overnight. You have a topology and a multi-gigabyte .xtc/.dcd/.nc trajectory, and now you need the analysis that decides whether the run is usable: backbone RMSD to confirm the structure equilibrated, per-residue RMSF to find the flexible loops, radius of gyration to check for unfolding, a residue-residue contact map to see which interactions survived, hydrogen-bond occupancy across a binding interface, and PCA to extract the dominant collective motions. None of this requires running MD — it requires getting the atom selections right, aligning frames before RMSF, skipping the equilibration window, and not running out of RAM on a long trajectory. Writing that script from scratch each time, looking up MDAnalysis’s selection grammar and the align.AlignTraj-before-RMSF gotcha, is exactly the boilerplate an agent should absorb.

Solved looks like: point Claude at the topology and trajectory, state the selections and the analysis window, and get back labelled plots plus a numeric summary you can paste into a figure or a methods section.

  1. Install the MDAnalysis skill from the SciAgent-Skills collection (clone the repo and /plugin install sciagent-skills, or copy the mdanalysis-trajectory skill into ~/.claude/skills/). The skill declares its own Python deps (MDAnalysis, numpy, matplotlib); let it install them on first use. Keep the MDTraj skill installed too as a fallback for DSSP secondary structure and Ramachandran torsions.

  2. Load the trajectory and sanity-check it before any analysis. Prompt:

    Load topology md/prod.gro and trajectory md/prod.xtc with MDAnalysis.
    Report: number of frames, time per frame, total simulated time,
    number of atoms, number of protein residues, and whether the box
    is present. Do NOT analyze yet — just confirm the Universe loaded
    and tell me the frame stride so I can choose an analysis window.
    

    This catches the common mismatch (topology atom count != trajectory atom count) before you waste a pass over a large file.

  3. RMSD / RMSF / Rg over the production window. Specify the equilibration cutoff explicitly:

    On the loaded Universe, using frames from t = 10 ns onward
    (drop equilibration):
      - Backbone RMSD vs the first analyzed frame, after RMSD-fitting
        on backbone atoms (use align.AlignTraj first).
      - Per-residue RMSF on CA atoms, computed on the aligned trajectory.
      - Radius of gyration time series for the protein selection.
    Save plots to md/analysis/{rmsd,rmsf,rg}.png and write the mean
    RMSD over the window, mean Rg, and the top-5 residues by RMSF to
    md/analysis/summary.md. Flag any CA RMSF > 0.3 nm as a flexible
    region.
    

    The align.AlignTraj-before-RMSF instruction matters: RMSF on an un-aligned trajectory measures overall tumbling, not internal flexibility.

  4. Contacts and hydrogen bonds across the interface you care about. Name the two selections:

    Selection A = "protein and resid 50-70" (the loop)
    Selection B = "protein and resid 120-140" (the partner helix)
    Over the same t >= 10 ns window:
      - Residue-residue minimum-distance contact frequency map between
        A and B (contact if any heavy-atom pair < 4.5 A), save as a
        heatmap.
      - Hydrogen bonds between A and B with the HydrogenBondAnalysis
        module; report each donor-acceptor pair and its % occupancy,
        sorted descending.
    Save to md/analysis/contacts.png and md/analysis/hbonds.csv.
    
  5. PCA on the backbone for the dominant motions. Closing prompt:

    Run PCA on backbone coordinates over t >= 10 ns:
      - Scree plot of the first 10 PCs (variance explained).
      - Project the trajectory onto PC1 and PC2; save the 2D
        projection scatter coloured by time.
      - Report the cumulative variance captured by PC1-3.
    Save to md/analysis/pca_*.png. If PC1-3 capture < 60% of variance,
    note that the motion is diffuse rather than dominated by a single mode.
    

    For 8-state DSSP secondary-structure timelines or Gly/Pro-aware Ramachandran plots, switch to the MDTraj skill — it wraps DSSP and the torsion helpers directly.

Why this assembly

Rung 2 of the simplicity ladder, and it stops there. The analysis is pure post-processing of a file you already have, and the MDAnalysis skill already encodes the selection grammar, the multi-format readers (GROMACS/AMBER/NAMD/CHARMM/LAMMPS), and the analysis modules (RMSD/RMSF/Rg/contacts/H-bonds/PCA). One skill covers every step above. Plain Claude Code with no skill can import MDAnalysis via Bash, but it re-derives the align-before-RMSF idiom and the selection syntax each session and tends to compute RMSF on an un-fitted trajectory. A multi-tool harness adds nothing — there is a single library doing the work; MDTraj is a complementary fallback (DSSP, Ramachandran), not a third coordinating layer. Rung 4 (an autonomous MD agent like MDCrow) is for planning across simulation and analysis, not for running a fixed analysis battery on a finished trajectory.

Availability

Fully open. The MDAnalysis skill is community OSS (skill content CC BY 4.0; MDAnalysis itself GPL-2.0). The MDTraj skill fallback is LGPL-2.1. No subscription, account, or institutional licence is required. The only cost is the Claude inference that drives the skill.

Compute requirements

Workstation, GPU optional. The analysis itself is CPU-bound NumPy, not GPU work — the “GPU” tier reflects that the trajectories you analyze typically come from GPU MD runs, not that analysis needs one. A 50 ns trajectory of a ~300-residue protein at 10 ps stride (~5000 frames) runs the full RMSD/RMSF/Rg/contacts/PCA battery in 2–10 minutes on a laptop with 16 GB RAM. The memory ceiling is the binding constraint: MDAnalysis streams frames so it stays modest, but PCA and contact maps that materialise full coordinate arrays can spike — for trajectories above ~50k frames, analyze in a transformations/in-memory slice of the production window rather than loading everything. Disk: outputs are a handful of PNGs and CSVs, <10 MB.

Evidence

Proposed. No documented end-to-end attempt of “Claude Code + the MDAnalysis skill” on a published benchmark is known. The closest evidence is at the component and the agent-class level:

Alternatives considered

  • Plain Claude Code, no skill. Workable for a one-off if you are happy to inspect every MDAnalysis call. The skill earns its place by encoding the align-before-RMSF idiom and the selection grammar so the first pass is correct; reach for bare Claude only when you want to hand-write the analysis yourself.
  • Do the analysis inside the GROMACS Copilot. The Set up a protein MD simulation in GROMACS recipe already produces RMSD/RMSF/Rg as a closing step. Use that when you are running the simulation in the same session. Use this recipe when the trajectory already exists, came from a non-GROMACS engine, or needs analysis the Copilot’s closing step does not cover (contact maps, H-bond occupancy, interface-specific selections).
  • MDCrow (autonomous rung). Not currently wrapped in autonomous-science/systems/ as a recommendable component. Reach for it directly from its repo when you need an agent that decides which analyses to run from a scientific question, rather than running the fixed battery above.

See also

Sources


Tried this recipe?

Share feedback — what worked, what didn’t, what you’d change. The form opens with this recipe pre-selected and a link back to this page.