deepchem
Molecular ML with diverse featurizers and pre-built datasets — use for property prediction (ADMET, toxicity) with traditional ML or GNNs when you want extensive featurization options and MoleculeNet benchmarks with pre-trained models.
Predict molecular properties with GNNs and MoleculeNet benchmarks
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
predict ADMET propertiesmolecular property predictionMoleculeNet benchmarkDeepChem modeltoxicity prediction
What it does
deepchem is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a DeepChem expert for molecular machine learning — covering MoleculeNet dataset loading, diverse molecular featurization (circular fingerprints, graph convolution features, Weave, MPNN), model training (GraphConv, AttentiveFP, MPNN, random forest, XGBoost), ADMET property prediction, and benchmarking against MoleculeNet baselines.
A session produces a complete molecular ML pipeline: dataset loading, featurization, model training with cross-validation, and benchmark comparison against literature baselines on standard MoleculeNet tasks.
When to use it
Reach for it when:
- You want to quickly benchmark a new compound set against MoleculeNet ADMET tasks using pre-trained or quickly trained models
- You need a range of molecular featurization options in one library without assembling them from separate packages
- You’re running graph-based property prediction where DeepChem’s built-in GNN implementations are sufficient
When not to reach for it:
- Graph-first PyTorch workflows with custom architectures — use
torch-geometric - Standard cheminformatics without ML — use
rdkitordatamol
Install
Copy the SKILL.md from K-Dense AI’s deepchem folder into .claude/skills/deepchem/ in your project. DeepChem installation requires careful dependency management — use the conda or pip install instructions from deepchem.io rather than a bare pip install deepchem.
Trigger phrases: “predict ADMET properties”, “molecular property prediction”, “MoleculeNet benchmark”, “toxicity prediction”.
What a session looks like
A typical session has three phases:
- Dataset and task specification. Specify the dataset (from MoleculeNet or a custom SMILES+label CSV) and the prediction task type (regression, binary classification, multi-task). Claude loads the appropriate MoleculeNet dataset or sets up the custom data loader.
- Featurization and model selection. Claude selects the featurizer and model appropriate to the task — circular fingerprints + random forest for baseline, GraphConv or AttentiveFP for graph-based models — and constructs the training pipeline.
- Training and evaluation. The model trains with a standard splitter (scaffold split for realistic generalization assessment), evaluation uses the task-appropriate metric (AUROC for classification, RMSE for regression), and results are compared to MoleculeNet leaderboard baselines.
Receipts
Where it works well:
- Rapid ADMET baseline models — DeepChem’s MoleculeNet integration means you go from SMILES list to cross-validated AUC in under 30 lines of code for standard toxicity endpoints
- Scaffold-split evaluation — the built-in scaffold splitter is correct and produces more realistic generalization estimates than random splits for drug discovery tasks
Where it backfires:
- DeepChem’s dependency installation is brittle — version conflicts with TensorFlow, PyTorch, and RDKit are common and require careful environment management
- Custom GNN architectures are harder to implement cleanly in DeepChem than in PyG; DeepChem’s model API is higher-level but less flexible
Pattern that works: use scaffold split instead of random split for all MoleculeNet evaluations — random split inflates performance by allowing near-duplicate molecules in train and test sets, producing results that don’t generalize to novel chemical space.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the deepchem folder of their public scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.