scanpy
Standard single-cell RNA-seq analysis pipeline — use for QC, normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression, and visualization. Best for exploratory scRNA-seq analysis with established workflows.
Run standard single-cell RNA-seq analysis pipelines
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
analyze scRNA-seq datasingle-cell analysisUMAP clusteringfind marker genesscanpy workflow
What it does
scanpy is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Scanpy workflow expert for single-cell RNA-seq analysis — covering quality control (mitochondrial gene filtering, doublet detection), normalization, highly variable gene selection, PCA, neighborhood graph construction, UMAP/t-SNE, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, logistic regression), and cell type annotation.
A session produces a complete scRNA-seq analysis pipeline: an AnnData object processed from raw counts to annotated cell clusters, with publication-quality UMAP plots and a marker gene table for each cluster.
When to use it
Reach for it when:
- You have 10X Genomics, Smart-seq2, or other scRNA-seq data in MTX, H5, or H5AD format and need the standard analysis pipeline
- You want to identify cell types from clustering results and need marker gene analysis
- You’re running trajectory inference or pseudotime analysis on a differentiation dataset
When not to reach for it:
- Deep probabilistic models for batch integration or latent space learning — use
scvi-tools - Data format manipulation and H5AD file operations without analysis — use
anndata
Install
Copy the SKILL.md from K-Dense AI’s scanpy folder into .claude/skills/scanpy/ in your project.
Trigger phrases: “analyze scRNA-seq data”, “single-cell analysis”, “UMAP clustering”, “find marker genes”, “scanpy workflow”.
What a session looks like
A typical session has three phases:
- Data loading and QC. Claude loads the data, computes QC metrics (number of genes, total counts, mitochondrial gene fraction), and filters cells and genes based on quality thresholds appropriate to the protocol.
- Normalization and dimensionality reduction. Standard preprocessing runs: library-size normalization, log1p transformation, highly variable gene selection, PCA, and neighbor graph construction. UMAP is computed for visualization.
- Clustering and annotation. Leiden clustering with resolution tuning, differential expression testing for each cluster, and a ranked marker gene table. Cell type annotation using known markers or automated tools is added where requested.
Receipts
Where it works well:
- 10X Genomics PBMC and tissue datasets where the established QC thresholds and workflow parameters are well-calibrated — the standard pipeline produces sensible results with minimal tuning
- Generating publication-quality UMAP plots with cluster annotations — Scanpy’s matplotlib integration handles this cleanly
Where it backfires:
- Multi-sample datasets with strong batch effects require integration (Harmony, scVI) before clustering — vanilla Scanpy will produce batch-driven clusters that look like biology
- Very large datasets (>500k cells) strain Scanpy’s in-memory model; consider subsampling or
dask-backed AnnData for the preprocessing steps
Pattern that works: always inspect QC metric distributions before applying cutoffs — the right mitochondrial gene threshold varies substantially across tissue types and protocols, and hardcoded defaults (20%) often cut too many or too few cells.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the scanpy folder of their public scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.