anndata

Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem.

Manage annotated single-cell data matrices

Source K-Dense AI

License MIT

First documented 2026-04-28

Receipts generic

Science Bioinformatics

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

AnnData
h5ad file
single-cell data structure
scverse data format

What it does

anndata is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into an expert on AnnData — the central data container for single-cell genomics in the scverse ecosystem — covering creation, manipulation, I/O, and integration with scanpy, scvi-tools, and cellxgene-census.

The output of a session is Python code that correctly constructs or modifies AnnData objects: storing the expression matrix in .X, observation metadata in .obs, variable metadata in .var, and multi-dimensional embeddings in .obsm — then saving or loading via .h5ad, .zarr, or related formats.

AnnData is the data-format skill. For analysis workflows you reach for scanpy; for probabilistic models, scvi-tools; for population-scale queries, cellxgene-census. This skill handles the format itself.

When to use it

Reach for it when:

You are creating an AnnData object from scratch (count matrices, metadata DataFrames) or reading an existing .h5ad file
You need to concatenate, subset, filter, or transform an AnnData object correctly without corrupting obs/var alignment
You are converting between sparse and dense storage, switching to backed mode for large datasets, or serializing to Zarr for cloud access

When not to reach for it:

You want to run clustering, differential expression, or UMAP — that is scanpy’s job
You need to train a variational autoencoder on the data — reach for scvi-tools

Install

Copy the SKILL.md from scientific-skills/anndata into .claude/skills/anndata/.

The skill activates on trigger phrases including “AnnData”, “h5ad file”, and “scverse data format”.

What a session looks like

A typical session has three phases:

Object construction. Claude builds the AnnData from your inputs — a count matrix (dense or sparse), a cell metadata DataFrame, and a gene metadata DataFrame — validating that indices align before assembling.
Manipulation. Claude writes subsetting, filtering, and concatenation code that preserves the .obs/.var alignment invariant and handles the sparse-to-dense conversion correctly for downstream tools.
I/O. Claude generates read/write calls in the appropriate format (.h5ad for local, .zarr for cloud or out-of-core), including backed-mode setup for datasets that do not fit in RAM.

Receipts

Honest reporting on what anndata handles well and where it falls short:

Where it works well:

Constructing AnnData objects from messy multi-file datasets where alignment errors are easy to introduce manually
Setting up backed mode for datasets that would otherwise exhaust RAM during loading
Concatenating multiple experimental batches while keeping batch labels in .obs

Where it backfires:

Very large sparse matrices can hit memory limits even in backed mode during certain operations; the skill may not predict this upfront
Some .h5ad files written by older scanpy versions have schema quirks that require manual patching

Pattern that works: always set adata.obs_names_make_unique() and adata.var_names_make_unique() after concatenation — duplicate indices cause silent downstream errors in scanpy.

Source and attribution

Originally authored by K-Dense, Inc.. The canonical SKILL.md lives in the anndata folder of the scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and updates, defer to the source repo.