# deepchem

> Molecular ML with diverse featurizers and pre-built datasets — use for property prediction (ADMET, toxicity) with traditional ML or GNNs when you want extensive featurization options and MoleculeNet benchmarks with pre-trained models.

**Use case**: Predict molecular properties with GNNs and MoleculeNet benchmarks

**Canonical URL**: https://agentcookbooks.com/skills/deepchem/

**Topics**: claude-code, skills, science, cheminformatics

**Trigger phrases**: "predict ADMET properties", "molecular property prediction", "MoleculeNet benchmark", "DeepChem model", "toxicity prediction"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/deepchem)

**License**: MIT

---

## What it does

`deepchem` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into a DeepChem expert for molecular machine learning — covering MoleculeNet dataset loading, diverse molecular featurization (circular fingerprints, graph convolution features, Weave, MPNN), model training (GraphConv, AttentiveFP, MPNN, random forest, XGBoost), ADMET property prediction, and benchmarking against MoleculeNet baselines.

A session produces a complete molecular ML pipeline: dataset loading, featurization, model training with cross-validation, and benchmark comparison against literature baselines on standard MoleculeNet tasks.

## When to use it

Reach for it when:

- You want to quickly benchmark a new compound set against MoleculeNet ADMET tasks using pre-trained or quickly trained models
- You need a range of molecular featurization options in one library without assembling them from separate packages
- You're running graph-based property prediction where DeepChem's built-in GNN implementations are sufficient

When *not* to reach for it:

- Graph-first PyTorch workflows with custom architectures — use `torch-geometric`
- Standard cheminformatics without ML — use `rdkit` or `datamol`

## Install

Copy the `SKILL.md` from K-Dense AI's [deepchem folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/deepchem) into `.claude/skills/deepchem/` in your project. DeepChem installation requires careful dependency management — use the conda or pip install instructions from [deepchem.io](https://deepchem.io) rather than a bare `pip install deepchem`.

Trigger phrases: "predict ADMET properties", "molecular property prediction", "MoleculeNet benchmark", "toxicity prediction".

## What a session looks like

A typical session has three phases:

1. **Dataset and task specification.** Specify the dataset (from MoleculeNet or a custom SMILES+label CSV) and the prediction task type (regression, binary classification, multi-task). Claude loads the appropriate MoleculeNet dataset or sets up the custom data loader.
2. **Featurization and model selection.** Claude selects the featurizer and model appropriate to the task — circular fingerprints + random forest for baseline, GraphConv or AttentiveFP for graph-based models — and constructs the training pipeline.
3. **Training and evaluation.** The model trains with a standard splitter (scaffold split for realistic generalization assessment), evaluation uses the task-appropriate metric (AUROC for classification, RMSE for regression), and results are compared to MoleculeNet leaderboard baselines.

## Receipts

**Where it works well:**
- Rapid ADMET baseline models — DeepChem's MoleculeNet integration means you go from SMILES list to cross-validated AUC in under 30 lines of code for standard toxicity endpoints
- Scaffold-split evaluation — the built-in scaffold splitter is correct and produces more realistic generalization estimates than random splits for drug discovery tasks

**Where it backfires:**
- DeepChem's dependency installation is brittle — version conflicts with TensorFlow, PyTorch, and RDKit are common and require careful environment management
- Custom GNN architectures are harder to implement cleanly in DeepChem than in PyG; DeepChem's model API is higher-level but less flexible

**Pattern that works:** use scaffold split instead of random split for all MoleculeNet evaluations — random split inflates performance by allowing near-duplicate molecules in train and test sets, producing results that don't generalize to novel chemical space.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`deepchem` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/deepchem) of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.