# biopython

> Comprehensive molecular biology toolkit — use for sequence manipulation, file parsing (FASTA/GenBank/PDB), phylogenetics, and programmatic NCBI/PubMed access (Bio.Entrez) for batch processing, custom bioinformatics pipelines, and BLAST automation.

**Use case**: Parse molecular biology files and automate NCBI queries

**Canonical URL**: https://agentcookbooks.com/skills/biopython/

**Topics**: claude-code, skills, science, bioinformatics

**Trigger phrases**: "parse FASTA file", "BLAST search with Python", "read GenBank file", "NCBI Entrez query", "sequence analysis with Biopython"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/biopython)

**License**: MIT

---

## What it does

`biopython` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into a Biopython expert covering the full molecular biology toolkit — sequence objects (`SeqRecord`, `Seq`), file parsers (FASTA, GenBank, PDB, FASTQ, CLUSTAL), `Bio.Entrez` for programmatic NCBI and PubMed access, BLAST automation (remote and local), multiple sequence alignment, and phylogenetic tree construction.

A session produces Python code for a complete bioinformatics task: parsing a sequence file, running a BLAST search, retrieving records from NCBI, or building a phylogenetic tree — all within Python without manual database navigation.

## When to use it

Reach for it when:

- You need to batch-process sequence files (FASTA, GenBank) or convert between formats
- You're automating NCBI queries — fetching sequences, publication metadata, or taxonomy records via `Bio.Entrez` without manual downloads
- You're running BLAST remotely or locally via Python and want to parse the results programmatically

When *not* to reach for it:

- Quick single-gene lookups — use `gget` for interactive exploration
- Multi-service database integration from a single call — use `bioservices`

## Install

Copy the `SKILL.md` from K-Dense AI's [biopython folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/biopython) into `.claude/skills/biopython/` in your project.

Trigger phrases: "parse FASTA file", "BLAST search with Python", "read GenBank file", "NCBI Entrez query".

## What a session looks like

A typical session has three phases:

1. **Task and format specification.** Describe the biological task — parsing sequences, database retrieval, alignment, or phylogenetics. Claude identifies the appropriate Biopython module and sets up the parser or query.
2. **Code generation.** Claude writes the Biopython code with proper resource handling (always `Entrez.email` set, handles for file parsers closed after iteration), error handling for network timeouts, and rate-limit compliance for NCBI queries.
3. **Output processing.** Parsed records are converted to the needed format — a pandas DataFrame, output FASTA, or BLAST hit table — with relevant fields extracted from the complex Biopython object hierarchy.

## Receipts

**Where it works well:**
- Batch NCBI downloads using `Bio.Entrez.efetch` with a list of accessions — Biopython handles the chunked requests and XML parsing that makes this painful to implement manually
- Converting between sequence formats at scale — GenBank to FASTA, FASTQ to FASTA — with `SeqIO.convert()` in a single command

**Where it backfires:**
- NCBI Entrez rate limits (3 requests/second without an API key, 10 with) cause failures in tight loops without explicit throttling; Biopython doesn't add delays automatically
- The Biopython object model for GenBank records is deep and nested; extracting features (CDS, exons, annotations) requires navigating several levels that Claude navigates correctly but beginner users find opaque

**Pattern that works:** always set `Entrez.email` before any NCBI query — NCBI blocks requests without a contact email, and Biopython will raise an error that looks like a network issue rather than a missing field.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`biopython` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/biopython) of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.