scanpy

Standard single-cell RNA-seq analysis pipeline — use for QC, normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression, and visualization. Best for exploratory scRNA-seq analysis with established workflows.

Run standard single-cell RNA-seq analysis pipelines

Source K-Dense AI
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • analyze scRNA-seq data
  • single-cell analysis
  • UMAP clustering
  • find marker genes
  • scanpy workflow

What it does

scanpy is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Scanpy workflow expert for single-cell RNA-seq analysis — covering quality control (mitochondrial gene filtering, doublet detection), normalization, highly variable gene selection, PCA, neighborhood graph construction, UMAP/t-SNE, Leiden/Louvain clustering, differential expression (Wilcoxon, t-test, logistic regression), and cell type annotation.

A session produces a complete scRNA-seq analysis pipeline: an AnnData object processed from raw counts to annotated cell clusters, with publication-quality UMAP plots and a marker gene table for each cluster.

When to use it

Reach for it when:

  • You have 10X Genomics, Smart-seq2, or other scRNA-seq data in MTX, H5, or H5AD format and need the standard analysis pipeline
  • You want to identify cell types from clustering results and need marker gene analysis
  • You’re running trajectory inference or pseudotime analysis on a differentiation dataset

When not to reach for it:

  • Deep probabilistic models for batch integration or latent space learning — use scvi-tools
  • Data format manipulation and H5AD file operations without analysis — use anndata

Install

Copy the SKILL.md from K-Dense AI’s scanpy folder into .claude/skills/scanpy/ in your project.

Trigger phrases: “analyze scRNA-seq data”, “single-cell analysis”, “UMAP clustering”, “find marker genes”, “scanpy workflow”.

What a session looks like

A typical session has three phases:

  1. Data loading and QC. Claude loads the data, computes QC metrics (number of genes, total counts, mitochondrial gene fraction), and filters cells and genes based on quality thresholds appropriate to the protocol.
  2. Normalization and dimensionality reduction. Standard preprocessing runs: library-size normalization, log1p transformation, highly variable gene selection, PCA, and neighbor graph construction. UMAP is computed for visualization.
  3. Clustering and annotation. Leiden clustering with resolution tuning, differential expression testing for each cluster, and a ranked marker gene table. Cell type annotation using known markers or automated tools is added where requested.

Receipts

Where it works well:

  • 10X Genomics PBMC and tissue datasets where the established QC thresholds and workflow parameters are well-calibrated — the standard pipeline produces sensible results with minimal tuning
  • Generating publication-quality UMAP plots with cluster annotations — Scanpy’s matplotlib integration handles this cleanly

Where it backfires:

  • Multi-sample datasets with strong batch effects require integration (Harmony, scVI) before clustering — vanilla Scanpy will produce batch-driven clusters that look like biology
  • Very large datasets (>500k cells) strain Scanpy’s in-memory model; consider subsampling or dask-backed AnnData for the preprocessing steps

Pattern that works: always inspect QC metric distributions before applying cutoffs — the right mitochondrial gene threshold varies substantially across tissue types and protocols, and hardcoded defaults (20%) often cut too many or too few cells.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the scanpy folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.