biopython
Comprehensive molecular biology toolkit — use for sequence manipulation, file parsing (FASTA/GenBank/PDB), phylogenetics, and programmatic NCBI/PubMed access (Bio.Entrez) for batch processing, custom bioinformatics pipelines, and BLAST automation.
Parse molecular biology files and automate NCBI queries
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
parse FASTA fileBLAST search with Pythonread GenBank fileNCBI Entrez querysequence analysis with Biopython
What it does
biopython is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Biopython expert covering the full molecular biology toolkit — sequence objects (SeqRecord, Seq), file parsers (FASTA, GenBank, PDB, FASTQ, CLUSTAL), Bio.Entrez for programmatic NCBI and PubMed access, BLAST automation (remote and local), multiple sequence alignment, and phylogenetic tree construction.
A session produces Python code for a complete bioinformatics task: parsing a sequence file, running a BLAST search, retrieving records from NCBI, or building a phylogenetic tree — all within Python without manual database navigation.
When to use it
Reach for it when:
- You need to batch-process sequence files (FASTA, GenBank) or convert between formats
- You’re automating NCBI queries — fetching sequences, publication metadata, or taxonomy records via
Bio.Entrezwithout manual downloads - You’re running BLAST remotely or locally via Python and want to parse the results programmatically
When not to reach for it:
- Quick single-gene lookups — use
ggetfor interactive exploration - Multi-service database integration from a single call — use
bioservices
Install
Copy the SKILL.md from K-Dense AI’s biopython folder into .claude/skills/biopython/ in your project.
Trigger phrases: “parse FASTA file”, “BLAST search with Python”, “read GenBank file”, “NCBI Entrez query”.
What a session looks like
A typical session has three phases:
- Task and format specification. Describe the biological task — parsing sequences, database retrieval, alignment, or phylogenetics. Claude identifies the appropriate Biopython module and sets up the parser or query.
- Code generation. Claude writes the Biopython code with proper resource handling (always
Entrez.emailset, handles for file parsers closed after iteration), error handling for network timeouts, and rate-limit compliance for NCBI queries. - Output processing. Parsed records are converted to the needed format — a pandas DataFrame, output FASTA, or BLAST hit table — with relevant fields extracted from the complex Biopython object hierarchy.
Receipts
Where it works well:
- Batch NCBI downloads using
Bio.Entrez.efetchwith a list of accessions — Biopython handles the chunked requests and XML parsing that makes this painful to implement manually - Converting between sequence formats at scale — GenBank to FASTA, FASTQ to FASTA — with
SeqIO.convert()in a single command
Where it backfires:
- NCBI Entrez rate limits (3 requests/second without an API key, 10 with) cause failures in tight loops without explicit throttling; Biopython doesn’t add delays automatically
- The Biopython object model for GenBank records is deep and nested; extracting features (CDS, exons, annotations) requires navigating several levels that Claude navigates correctly but beginner users find opaque
Pattern that works: always set Entrez.email before any NCBI query — NCBI blocks requests without a contact email, and Biopython will raise an error that looks like a network issue rather than a missing field.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the biopython folder of their public scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.