deepchem

Molecular ML with diverse featurizers and pre-built datasets — use for property prediction (ADMET, toxicity) with traditional ML or GNNs when you want extensive featurization options and MoleculeNet benchmarks with pre-trained models.

Predict molecular properties with GNNs and MoleculeNet benchmarks

Source K-Dense AI
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • predict ADMET properties
  • molecular property prediction
  • MoleculeNet benchmark
  • DeepChem model
  • toxicity prediction

What it does

deepchem is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a DeepChem expert for molecular machine learning — covering MoleculeNet dataset loading, diverse molecular featurization (circular fingerprints, graph convolution features, Weave, MPNN), model training (GraphConv, AttentiveFP, MPNN, random forest, XGBoost), ADMET property prediction, and benchmarking against MoleculeNet baselines.

A session produces a complete molecular ML pipeline: dataset loading, featurization, model training with cross-validation, and benchmark comparison against literature baselines on standard MoleculeNet tasks.

When to use it

Reach for it when:

  • You want to quickly benchmark a new compound set against MoleculeNet ADMET tasks using pre-trained or quickly trained models
  • You need a range of molecular featurization options in one library without assembling them from separate packages
  • You’re running graph-based property prediction where DeepChem’s built-in GNN implementations are sufficient

When not to reach for it:

  • Graph-first PyTorch workflows with custom architectures — use torch-geometric
  • Standard cheminformatics without ML — use rdkit or datamol

Install

Copy the SKILL.md from K-Dense AI’s deepchem folder into .claude/skills/deepchem/ in your project. DeepChem installation requires careful dependency management — use the conda or pip install instructions from deepchem.io rather than a bare pip install deepchem.

Trigger phrases: “predict ADMET properties”, “molecular property prediction”, “MoleculeNet benchmark”, “toxicity prediction”.

What a session looks like

A typical session has three phases:

  1. Dataset and task specification. Specify the dataset (from MoleculeNet or a custom SMILES+label CSV) and the prediction task type (regression, binary classification, multi-task). Claude loads the appropriate MoleculeNet dataset or sets up the custom data loader.
  2. Featurization and model selection. Claude selects the featurizer and model appropriate to the task — circular fingerprints + random forest for baseline, GraphConv or AttentiveFP for graph-based models — and constructs the training pipeline.
  3. Training and evaluation. The model trains with a standard splitter (scaffold split for realistic generalization assessment), evaluation uses the task-appropriate metric (AUROC for classification, RMSE for regression), and results are compared to MoleculeNet leaderboard baselines.

Receipts

Where it works well:

  • Rapid ADMET baseline models — DeepChem’s MoleculeNet integration means you go from SMILES list to cross-validated AUC in under 30 lines of code for standard toxicity endpoints
  • Scaffold-split evaluation — the built-in scaffold splitter is correct and produces more realistic generalization estimates than random splits for drug discovery tasks

Where it backfires:

  • DeepChem’s dependency installation is brittle — version conflicts with TensorFlow, PyTorch, and RDKit are common and require careful environment management
  • Custom GNN architectures are harder to implement cleanly in DeepChem than in PyG; DeepChem’s model API is higher-level but less flexible

Pattern that works: use scaffold split instead of random split for all MoleculeNet evaluations — random split inflates performance by allowing near-duplicate molecules in train and test sets, producing results that don’t generalize to novel chemical space.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the deepchem folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.