# database-lookup

> Search 78 public scientific, biomedical, materials science, and economic databases via REST APIs — covering physics, earth science, chemistry, biology/genomics, disease/clinical, regulatory, economics, and demographics databases.

**Use case**: Query 78 public scientific databases from a single skill

**Canonical URL**: https://agentcookbooks.com/skills/database-lookup/

**Topics**: claude-code, skills, science, science

**Trigger phrases**: "look up in PubChem", "query UniProt", "search ClinicalTrials", "Materials Project lookup", "database query"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/database-lookup)

**License**: MIT

---

## What it does

`database-lookup` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into a multi-database query agent covering 78 public scientific databases via REST APIs — including chemistry (PubChem, ChEMBL, DrugBank, KEGG, ZINC, BindingDB), biology/genomics (UniProt, STRING, Ensembl, NCBI Gene, GEO, PDB, AlphaFold, Human Protein Atlas), disease/clinical (ClinicalTrials.gov, OMIM, ClinVar, TCGA, DisGeNET), materials (Materials Project, COD), regulatory (FDA, USPTO), and economics (FRED, World Bank).

A session produces structured query results from the appropriate database — protein records, compound properties, clinical trial listings, variant annotations, or economic indicators — returned in a pandas DataFrame or JSON.

## When to use it

Reach for it when:

- You need data from a specific public database and don't want to write API integration code for it
- You're pulling data across multiple databases in a single research workflow (e.g., compound from PubChem → target from UniProt → trials from ClinicalTrials.gov)
- You need economic or regulatory data (FDA drug approvals, USPTO patents, World Bank indicators) alongside scientific data in the same pipeline

When *not* to reach for it:

- Deep literature search across academic papers — use `paper-lookup` or `literature-review`
- Comprehensive genomics workflows requiring sequence analysis — use `biopython` or `gget`

## Install

Copy the `SKILL.md` from K-Dense AI's [database-lookup folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/database-lookup) into `.claude/skills/database-lookup/` in your project.

Trigger phrases: "look up in PubChem", "query UniProt", "search ClinicalTrials", "Materials Project lookup".

## What a session looks like

A typical session has three phases:

1. **Database and query specification.** Describe what you're looking for — a compound name/ID, gene symbol, disease, clinical trial criterion, or economic indicator. Claude identifies which of the 78 databases is most appropriate and confirms the query parameters.
2. **API retrieval.** Claude generates and executes the REST API call to the appropriate database, handling authentication (API keys where needed), pagination, and rate limits.
3. **Structured output.** Results are returned as a pandas DataFrame or formatted dict with the relevant fields extracted — not the raw JSON response. Claude explains which fields were returned and flags any unexpected empty results.

## Receipts

**Where it works well:**
- Compound lookups by name or InChI across PubChem and ChEMBL — Claude correctly identifies which database is most relevant and retrieves the right record with minimal disambiguation ambiguity
- ClinicalTrials.gov queries by condition and intervention — structured results with trial phase, enrollment, status, and primary outcomes in a clean format

**Where it backfires:**
- Some databases require API keys (DrugBank, BindingDB full access) that are not included in the skill — the skill queries what's publicly available but flags when full data requires credentials
- Cross-database joins (e.g., linking a PubChem CID to a UniProt target to TCGA expression data) require multiple query steps and intermediate identifier mapping that can introduce mismatches

**Pattern that works:** specify the primary identifier type upfront (PubChem CID, UniProt accession, Ensembl gene ID) rather than a name where possible — identifier-based queries are unambiguous and avoid the false match problem that gene/compound name lookups can produce.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`database-lookup` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/database-lookup) of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.