# rdkit

> Cheminformatics toolkit for fine-grained molecular control — SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, and molecular similarity for advanced control and custom algorithms.

**Use case**: Fine-grained cheminformatics and molecular analysis with RDKit

**Canonical URL**: https://agentcookbooks.com/skills/rdkit/

**Topics**: claude-code, skills, science, cheminformatics

**Trigger phrases**: "parse SMILES", "compute molecular descriptors", "substructure search", "generate 3D conformer", "molecular fingerprints"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/rdkit)

**License**: MIT

---

## What it does

`rdkit` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into an RDKit expert covering the full cheminformatics toolkit — SMILES and SDF parsing, molecular descriptor computation (MW, LogP, TPSA, HBD/HBA, rotatable bonds), fingerprint generation (Morgan, ECFP, MACCS, RDKit), substructure search with SMARTS patterns, 2D and 3D conformer generation (ETKDG), molecular similarity (Tanimoto, Dice), and reaction handling.

A session produces Python code that takes molecular inputs (SMILES strings, SDF files) and returns the requested chemical properties, filtered compound sets, or visualization files.

## When to use it

Reach for it when:

- You need advanced molecular control — custom sanitization, non-standard valences, or specialized fingerprint parameters
- You're implementing a custom cheminformatics algorithm that requires access to the RDKit C++ layer through Python
- You're doing substructure searches with complex SMARTS patterns that need precise control over the matching behavior

When *not* to reach for it:

- Standard drug discovery workflows with sensible defaults — use `datamol` (a Pythonic RDKit wrapper)
- Molecular ML with diverse featurization and MoleculeNet benchmarks — use `deepchem`

## Install

Copy the `SKILL.md` from K-Dense AI's [rdkit folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/rdkit) into `.claude/skills/rdkit/` in your project. RDKit is best installed via conda: `conda install -c conda-forge rdkit`.

Trigger phrases: "parse SMILES", "compute molecular descriptors", "substructure search", "generate 3D conformer", "molecular fingerprints".

## What a session looks like

A typical session has three phases:

1. **Input specification.** Provide SMILES strings, an SDF file path, or a compound library. Claude sets up the molecule loading with appropriate sanitization flags and handles invalid SMILES gracefully.
2. **Computation.** Claude generates the RDKit code for the requested operation: descriptor calculation via `Descriptors`, fingerprint generation via `AllChem`, substructure search via `HasSubstructMatch`, or 3D conformer embedding via `AllChem.EmbedMolecule(mol, AllChem.ETKDGv3())`.
3. **Output.** Results are returned as a pandas DataFrame, an SDF file, or a molecular visualization (2D depiction via `Draw.MolToImage`). Invalid molecules are flagged in the output rather than silently dropped.

## Receipts

**Where it works well:**
- Lipinski Rule of Five filtering on a compound library — descriptor computation across thousands of molecules is fast and the filtering logic is clean
- SMARTS-based substructure search for functional group identification — RDKit's SMARTS matching is comprehensive and Claude knows the common SMARTS patterns for standard pharmacophore features

**Where it backfires:**
- 3D conformer generation quality degrades for highly flexible molecules (>10 rotatable bonds) — ETKDG produces geometries but the ensemble may not represent the true conformational distribution
- Some exotic tautomers and charged species require manual sanitization overrides that are non-obvious from the error messages

**Pattern that works:** always check `mol is not None` after parsing SMILES — RDKit returns `None` for invalid SMILES rather than raising an exception, and downstream operations on `None` produce cryptic errors rather than informative ones.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`rdkit` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/rdkit) of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.