anndata

Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem.

Manage annotated single-cell data matrices

Source K-Dense AI
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • AnnData
  • h5ad file
  • single-cell data structure
  • scverse data format

What it does

anndata is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into an expert on AnnData — the central data container for single-cell genomics in the scverse ecosystem — covering creation, manipulation, I/O, and integration with scanpy, scvi-tools, and cellxgene-census.

The output of a session is Python code that correctly constructs or modifies AnnData objects: storing the expression matrix in .X, observation metadata in .obs, variable metadata in .var, and multi-dimensional embeddings in .obsm — then saving or loading via .h5ad, .zarr, or related formats.

AnnData is the data-format skill. For analysis workflows you reach for scanpy; for probabilistic models, scvi-tools; for population-scale queries, cellxgene-census. This skill handles the format itself.

When to use it

Reach for it when:

  • You are creating an AnnData object from scratch (count matrices, metadata DataFrames) or reading an existing .h5ad file
  • You need to concatenate, subset, filter, or transform an AnnData object correctly without corrupting obs/var alignment
  • You are converting between sparse and dense storage, switching to backed mode for large datasets, or serializing to Zarr for cloud access

When not to reach for it:

  • You want to run clustering, differential expression, or UMAP — that is scanpy’s job
  • You need to train a variational autoencoder on the data — reach for scvi-tools

Install

Copy the SKILL.md from scientific-skills/anndata into .claude/skills/anndata/.

The skill activates on trigger phrases including “AnnData”, “h5ad file”, and “scverse data format”.

What a session looks like

A typical session has three phases:

  1. Object construction. Claude builds the AnnData from your inputs — a count matrix (dense or sparse), a cell metadata DataFrame, and a gene metadata DataFrame — validating that indices align before assembling.
  2. Manipulation. Claude writes subsetting, filtering, and concatenation code that preserves the .obs/.var alignment invariant and handles the sparse-to-dense conversion correctly for downstream tools.
  3. I/O. Claude generates read/write calls in the appropriate format (.h5ad for local, .zarr for cloud or out-of-core), including backed-mode setup for datasets that do not fit in RAM.

Receipts

Honest reporting on what anndata handles well and where it falls short:

Where it works well:

  • Constructing AnnData objects from messy multi-file datasets where alignment errors are easy to introduce manually
  • Setting up backed mode for datasets that would otherwise exhaust RAM during loading
  • Concatenating multiple experimental batches while keeping batch labels in .obs

Where it backfires:

  • Very large sparse matrices can hit memory limits even in backed mode during certain operations; the skill may not predict this upfront
  • Some .h5ad files written by older scanpy versions have schema quirks that require manual patching

Pattern that works: always set adata.obs_names_make_unique() and adata.var_names_make_unique() after concatenation — duplicate indices cause silent downstream errors in scanpy.

Source and attribution

Originally authored by K-Dense, Inc.. The canonical SKILL.md lives in the anndata folder of the scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and updates, defer to the source repo.