datamol
Pythonic wrapper around RDKit with simplified interface and sensible defaults — preferred for standard drug discovery including SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformers, and parallel processing.
Drug discovery cheminformatics with a clean Pythonic RDKit wrapper
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
standardize SMILESdatamol fingerprintsmolecular clusteringdrug-like propertiesbatch molecular processing
What it does
datamol is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a datamol expert — the Pythonic wrapper around RDKit that makes standard drug discovery workflows cleaner and less error-prone: SMILES parsing with automatic sanitization, molecular standardization, descriptor computation, fingerprint generation, Tanimoto similarity clustering, 3D conformer generation, and parallel processing across compound libraries.
A session produces concise Python code that handles the common cheminformatics operations that require verbose RDKit boilerplate — datamol reduces this to one-liners with sensible defaults while returning native rdkit.Chem.Mol objects for compatibility.
When to use it
Reach for it when:
- You’re doing standard drug discovery cheminformatics (SMILES parsing, standardization, descriptor computation, clustering) and want clean, Pythonic code rather than verbose RDKit boilerplate
- You need to process a compound library in parallel — datamol’s
dm.parallelized()wraps joblib for easy multiprocessing - You want automatic error handling — datamol returns
Nonefor failed molecules and continues processing rather than crashing on the first invalid SMILES
When not to reach for it:
- Non-standard sanitization, unusual valence handling, or algorithm-level control — use
rdkitdirectly - Molecular ML with featurization and training — use
deepchemormolfeat
Install
Copy the SKILL.md from K-Dense AI’s datamol folder into .claude/skills/datamol/ in your project. Install via pip install datamol or conda install -c conda-forge datamol.
Trigger phrases: “standardize SMILES”, “datamol fingerprints”, “molecular clustering”, “batch molecular processing”.
What a session looks like
A typical session has three phases:
- Input loading. Provide a list of SMILES strings or an SDF file path. Claude uses
dm.to_mol()to parse all inputs, withdm.standardize_mol()applied to handle tautomers, salts, and charge normalization. - Core operation. The requested computation runs:
dm.descriptors.batch()for descriptor DataFrames,dm.to_fp()for fingerprint arrays,dm.cluster_mols()for Butina clustering, ordm.conformers.generate()for 3D structures — all with datamol’s clean API. - Output. Results are returned as pandas DataFrames or numpy arrays compatible with scikit-learn and other downstream tools. Invalid molecules are flagged in the output with a
mol_validboolean column.
Receipts
Where it works well:
- Library standardization before any ML or analysis —
dm.standardize_mol()handles the fragment selection, charge neutralization, and isotope removal that manual RDKit code handles inconsistently - Parallel descriptor computation on large compound libraries —
dm.parallelized()gives multicore speedup with one additional argument and proper error collection
Where it backfires:
- datamol’s high-level interface sometimes makes it harder to debug when a molecular operation fails silently; checking
Nonein the output requires explicit filtering - Less flexible than RDKit for any operation that requires access to the low-level C++ API or non-default algorithm parameters
Pattern that works: always run dm.sanitize_mol() before dm.to_fp() on external compound data — unsanitized molecules produce fingerprints that look valid but are computed on a corrupt molecular graph.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the datamol folder of their public scientific-agent-skills repository.
License: Apache-2.0. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.