# scanpy

> Standard single-cell RNA-seq analysis pipeline — use for QC, normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression, and visualization. Best for exploratory scRNA-seq analysis with established workflows.

**Use case**: Run standard single-cell RNA-seq analysis pipelines

**Canonical URL**: https://agentcookbooks.com/skills/scanpy/

**Topics**: claude-code, skills, science, bioinformatics

**Trigger phrases**: "analyze scRNA-seq data", "single-cell analysis", "UMAP clustering", "find marker genes", "scanpy workflow"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/scanpy)

**License**: MIT

---

## What it does

`scanpy` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into a Scanpy workflow expert for single-cell RNA-seq analysis — covering quality control (mitochondrial gene filtering, doublet detection), normalization, highly variable gene selection, PCA, neighborhood graph construction, UMAP/t-SNE, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, logistic regression), and cell type annotation.

A session produces a complete scRNA-seq analysis pipeline: an AnnData object processed from raw counts to annotated cell clusters, with publication-quality UMAP plots and a marker gene table for each cluster.

## When to use it

Reach for it when:

- You have 10X Genomics, Smart-seq2, or other scRNA-seq data in MTX, H5, or H5AD format and need the standard analysis pipeline
- You want to identify cell types from clustering results and need marker gene analysis
- You're running trajectory inference or pseudotime analysis on a differentiation dataset

When *not* to reach for it:

- Deep probabilistic models for batch integration or latent space learning — use `scvi-tools`
- Data format manipulation and H5AD file operations without analysis — use `anndata`

## Install

Copy the `SKILL.md` from K-Dense AI's [scanpy folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/scanpy) into `.claude/skills/scanpy/` in your project.

Trigger phrases: "analyze scRNA-seq data", "single-cell analysis", "UMAP clustering", "find marker genes", "scanpy workflow".

## What a session looks like

A typical session has three phases:

1. **Data loading and QC.** Claude loads the data, computes QC metrics (number of genes, total counts, mitochondrial gene fraction), and filters cells and genes based on quality thresholds appropriate to the protocol.
2. **Normalization and dimensionality reduction.** Standard preprocessing runs: library-size normalization, log1p transformation, highly variable gene selection, PCA, and neighbor graph construction. UMAP is computed for visualization.
3. **Clustering and annotation.** Leiden clustering with resolution tuning, differential expression testing for each cluster, and a ranked marker gene table. Cell type annotation using known markers or automated tools is added where requested.

## Receipts

**Where it works well:**
- 10X Genomics PBMC and tissue datasets where the established QC thresholds and workflow parameters are well-calibrated — the standard pipeline produces sensible results with minimal tuning
- Generating publication-quality UMAP plots with cluster annotations — Scanpy's matplotlib integration handles this cleanly

**Where it backfires:**
- Multi-sample datasets with strong batch effects require integration (Harmony, scVI) before clustering — vanilla Scanpy will produce batch-driven clusters that look like biology
- Very large datasets (>500k cells) strain Scanpy's in-memory model; consider subsampling or `dask`-backed AnnData for the preprocessing steps

**Pattern that works:** always inspect QC metric distributions before applying cutoffs — the right mitochondrial gene threshold varies substantially across tissue types and protocols, and hardcoded defaults (20%) often cut too many or too few cells.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`scanpy` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/scanpy) of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.