# anndata

> Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem.

**Use case**: Manage annotated single-cell data matrices

**Canonical URL**: https://agentcookbooks.com/skills/anndata/

**Topics**: claude-code, skills, science, bioinformatics

**Trigger phrases**: "AnnData", "h5ad file", "single-cell data structure", "scverse data format"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/anndata)

**License**: MIT

---

## What it does

`anndata` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into an expert on AnnData — the central data container for single-cell genomics in the scverse ecosystem — covering creation, manipulation, I/O, and integration with scanpy, scvi-tools, and cellxgene-census.

The output of a session is Python code that correctly constructs or modifies AnnData objects: storing the expression matrix in `.X`, observation metadata in `.obs`, variable metadata in `.var`, and multi-dimensional embeddings in `.obsm` — then saving or loading via `.h5ad`, `.zarr`, or related formats.

AnnData is the data-format skill. For analysis workflows you reach for scanpy; for probabilistic models, scvi-tools; for population-scale queries, cellxgene-census. This skill handles the format itself.

## When to use it

Reach for it when:

- You are creating an AnnData object from scratch (count matrices, metadata DataFrames) or reading an existing `.h5ad` file
- You need to concatenate, subset, filter, or transform an AnnData object correctly without corrupting obs/var alignment
- You are converting between sparse and dense storage, switching to backed mode for large datasets, or serializing to Zarr for cloud access

When *not* to reach for it:

- You want to run clustering, differential expression, or UMAP — that is scanpy's job
- You need to train a variational autoencoder on the data — reach for scvi-tools

## Install

Copy the SKILL.md from [scientific-skills/anndata](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/anndata) into `.claude/skills/anndata/`.

The skill activates on trigger phrases including "AnnData", "h5ad file", and "scverse data format".

## What a session looks like

A typical session has three phases:

1. **Object construction.** Claude builds the AnnData from your inputs — a count matrix (dense or sparse), a cell metadata DataFrame, and a gene metadata DataFrame — validating that indices align before assembling.
2. **Manipulation.** Claude writes subsetting, filtering, and concatenation code that preserves the `.obs`/`.var` alignment invariant and handles the sparse-to-dense conversion correctly for downstream tools.
3. **I/O.** Claude generates read/write calls in the appropriate format (`.h5ad` for local, `.zarr` for cloud or out-of-core), including backed-mode setup for datasets that do not fit in RAM.

## Receipts

Honest reporting on what `anndata` handles well and where it falls short:

**Where it works well:**
- Constructing AnnData objects from messy multi-file datasets where alignment errors are easy to introduce manually
- Setting up backed mode for datasets that would otherwise exhaust RAM during loading
- Concatenating multiple experimental batches while keeping `batch` labels in `.obs`

**Where it backfires:**
- Very large sparse matrices can hit memory limits even in backed mode during certain operations; the skill may not predict this upfront
- Some `.h5ad` files written by older scanpy versions have schema quirks that require manual patching

**Pattern that works:** always set `adata.obs_names_make_unique()` and `adata.var_names_make_unique()` after concatenation — duplicate indices cause silent downstream errors in scanpy.

## Source and attribution

Originally authored by [K-Dense, Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`anndata` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/anndata) of the scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and updates, defer to the source repo.