umap-learn

UMAP dimensionality reduction — fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing with HDBSCAN, supervised and parametric UMAP for high-dimensional data including single-cell, image embeddings, and text representations.

Reduce high-dimensional data to 2D/3D with UMAP

Source K-Dense AI
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • UMAP dimensionality reduction
  • visualize high-dimensional data
  • embed this dataset
  • UMAP clustering
  • reduce dimensions with UMAP

What it does

umap-learn is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a UMAP expert for dimensionality reduction and visualization — covering standard unsupervised UMAP for 2D/3D embedding, supervised UMAP with label information to improve class separation, parametric UMAP for out-of-sample projection, and UMAP as a preprocessing step for HDBSCAN density-based clustering.

A session produces Python code that takes a high-dimensional array (single-cell expression, image embeddings, text vectors, tabular features) and returns a 2D or 3D embedding suitable for visualization or downstream clustering.

When to use it

Reach for it when:

  • You have high-dimensional data and want a 2D visualization that preserves neighborhood structure better than PCA
  • You’re preprocessing high-dimensional features for HDBSCAN or other density-based clustering
  • You need out-of-sample projection — embedding new data points into an existing UMAP space (parametric UMAP)

When not to reach for it:

  • Small datasets (<500 points) where PCA or t-SNE is sufficient and UMAP’s graph construction overhead isn’t justified
  • When linear interpretability of the axes matters — UMAP axes are not interpretable like PCA loadings

Install

Copy the SKILL.md from K-Dense AI’s umap-learn folder into .claude/skills/umap-learn/ in your project.

Trigger phrases: “UMAP dimensionality reduction”, “visualize high-dimensional data”, “embed this dataset”, “UMAP clustering”.

What a session looks like

A typical session has three phases:

  1. Data and goal specification. Describe the data type, dimensionality, and downstream goal (visualization, clustering, or projection). Claude selects appropriate n_neighbors and min_dist parameters — larger n_neighbors for global structure, smaller min_dist for tighter clusters.
  2. UMAP fitting. Claude generates the UMAP fit code with a fixed random seed for reproducibility, optional metric selection (cosine for text/image embeddings, euclidean for normalized data), and appropriate data preprocessing (scaling, PCA initialization for speed).
  3. Visualization and downstream use. The 2D embedding is plotted with matplotlib/seaborn, colored by available labels or cluster assignments. For clustering use cases, Claude adds HDBSCAN fitting on the UMAP embedding.

Receipts

Where it works well:

  • Single-cell RNA-seq visualization — UMAP on PCA-reduced expression data is the de facto standard in the field and Scanpy calls this skill’s functionality internally
  • Text embedding visualization — running UMAP on sentence-transformer or word2vec embeddings reveals semantic clusters that are invisible in the raw 768-dimensional space

Where it backfires:

  • UMAP is non-deterministic across runs without a fixed seed, and different seeds can produce qualitatively different global layouts on the same data — always set random_state
  • Comparing UMAP embeddings across conditions or time points requires the parametric variant or aligned embedding approaches; standard UMAP embeddings are not directly comparable

Pattern that works: always initialize UMAP with init='pca' for large datasets — it’s faster to converge and produces more reproducible global structure than the default spectral initialization.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the umap-learn folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.