anndata
Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem.
Manage annotated single-cell data matrices
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
AnnDatah5ad filesingle-cell data structurescverse data format
What it does
anndata is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into an expert on AnnData — the central data container for single-cell genomics in the scverse ecosystem — covering creation, manipulation, I/O, and integration with scanpy, scvi-tools, and cellxgene-census.
The output of a session is Python code that correctly constructs or modifies AnnData objects: storing the expression matrix in .X, observation metadata in .obs, variable metadata in .var, and multi-dimensional embeddings in .obsm — then saving or loading via .h5ad, .zarr, or related formats.
AnnData is the data-format skill. For analysis workflows you reach for scanpy; for probabilistic models, scvi-tools; for population-scale queries, cellxgene-census. This skill handles the format itself.
When to use it
Reach for it when:
- You are creating an AnnData object from scratch (count matrices, metadata DataFrames) or reading an existing
.h5adfile - You need to concatenate, subset, filter, or transform an AnnData object correctly without corrupting obs/var alignment
- You are converting between sparse and dense storage, switching to backed mode for large datasets, or serializing to Zarr for cloud access
When not to reach for it:
- You want to run clustering, differential expression, or UMAP — that is scanpy’s job
- You need to train a variational autoencoder on the data — reach for scvi-tools
Install
Copy the SKILL.md from scientific-skills/anndata into .claude/skills/anndata/.
The skill activates on trigger phrases including “AnnData”, “h5ad file”, and “scverse data format”.
What a session looks like
A typical session has three phases:
- Object construction. Claude builds the AnnData from your inputs — a count matrix (dense or sparse), a cell metadata DataFrame, and a gene metadata DataFrame — validating that indices align before assembling.
- Manipulation. Claude writes subsetting, filtering, and concatenation code that preserves the
.obs/.varalignment invariant and handles the sparse-to-dense conversion correctly for downstream tools. - I/O. Claude generates read/write calls in the appropriate format (
.h5adfor local,.zarrfor cloud or out-of-core), including backed-mode setup for datasets that do not fit in RAM.
Receipts
Honest reporting on what anndata handles well and where it falls short:
Where it works well:
- Constructing AnnData objects from messy multi-file datasets where alignment errors are easy to introduce manually
- Setting up backed mode for datasets that would otherwise exhaust RAM during loading
- Concatenating multiple experimental batches while keeping
batchlabels in.obs
Where it backfires:
- Very large sparse matrices can hit memory limits even in backed mode during certain operations; the skill may not predict this upfront
- Some
.h5adfiles written by older scanpy versions have schema quirks that require manual patching
Pattern that works: always set adata.obs_names_make_unique() and adata.var_names_make_unique() after concatenation — duplicate indices cause silent downstream errors in scanpy.
Source and attribution
Originally authored by K-Dense, Inc.. The canonical SKILL.md lives in the anndata folder of the scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and updates, defer to the source repo.