rdkit
Cheminformatics toolkit for fine-grained molecular control — SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, and molecular similarity for advanced control and custom algorithms.
Fine-grained cheminformatics and molecular analysis with RDKit
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
parse SMILEScompute molecular descriptorssubstructure searchgenerate 3D conformermolecular fingerprints
What it does
rdkit is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into an RDKit expert covering the full cheminformatics toolkit — SMILES and SDF parsing, molecular descriptor computation (MW, LogP, TPSA, HBD/HBA, rotatable bonds), fingerprint generation (Morgan, ECFP, MACCS, RDKit), substructure search with SMARTS patterns, 2D and 3D conformer generation (ETKDG), molecular similarity (Tanimoto, Dice), and reaction handling.
A session produces Python code that takes molecular inputs (SMILES strings, SDF files) and returns the requested chemical properties, filtered compound sets, or visualization files.
When to use it
Reach for it when:
- You need advanced molecular control — custom sanitization, non-standard valences, or specialized fingerprint parameters
- You’re implementing a custom cheminformatics algorithm that requires access to the RDKit C++ layer through Python
- You’re doing substructure searches with complex SMARTS patterns that need precise control over the matching behavior
When not to reach for it:
- Standard drug discovery workflows with sensible defaults — use
datamol(a Pythonic RDKit wrapper) - Molecular ML with diverse featurization and MoleculeNet benchmarks — use
deepchem
Install
Copy the SKILL.md from K-Dense AI’s rdkit folder into .claude/skills/rdkit/ in your project. RDKit is best installed via conda: conda install -c conda-forge rdkit.
Trigger phrases: “parse SMILES”, “compute molecular descriptors”, “substructure search”, “generate 3D conformer”, “molecular fingerprints”.
What a session looks like
A typical session has three phases:
- Input specification. Provide SMILES strings, an SDF file path, or a compound library. Claude sets up the molecule loading with appropriate sanitization flags and handles invalid SMILES gracefully.
- Computation. Claude generates the RDKit code for the requested operation: descriptor calculation via
Descriptors, fingerprint generation viaAllChem, substructure search viaHasSubstructMatch, or 3D conformer embedding viaAllChem.EmbedMolecule(mol, AllChem.ETKDGv3()). - Output. Results are returned as a pandas DataFrame, an SDF file, or a molecular visualization (2D depiction via
Draw.MolToImage). Invalid molecules are flagged in the output rather than silently dropped.
Receipts
Where it works well:
- Lipinski Rule of Five filtering on a compound library — descriptor computation across thousands of molecules is fast and the filtering logic is clean
- SMARTS-based substructure search for functional group identification — RDKit’s SMARTS matching is comprehensive and Claude knows the common SMARTS patterns for standard pharmacophore features
Where it backfires:
- 3D conformer generation quality degrades for highly flexible molecules (>10 rotatable bonds) — ETKDG produces geometries but the ensemble may not represent the true conformational distribution
- Some exotic tautomers and charged species require manual sanitization overrides that are non-obvious from the error messages
Pattern that works: always check mol is not None after parsing SMILES — RDKit returns None for invalid SMILES rather than raising an exception, and downstream operations on None produce cryptic errors rather than informative ones.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the rdkit folder of their public scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.