biopython

Comprehensive molecular biology toolkit — use for sequence manipulation, file parsing (FASTA/GenBank/PDB), phylogenetics, and programmatic NCBI/PubMed access (Bio.Entrez) for batch processing, custom bioinformatics pipelines, and BLAST automation.

Parse molecular biology files and automate NCBI queries

Source K-Dense AI
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • parse FASTA file
  • BLAST search with Python
  • read GenBank file
  • NCBI Entrez query
  • sequence analysis with Biopython

What it does

biopython is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Biopython expert covering the full molecular biology toolkit — sequence objects (SeqRecord, Seq), file parsers (FASTA, GenBank, PDB, FASTQ, CLUSTAL), Bio.Entrez for programmatic NCBI and PubMed access, BLAST automation (remote and local), multiple sequence alignment, and phylogenetic tree construction.

A session produces Python code for a complete bioinformatics task: parsing a sequence file, running a BLAST search, retrieving records from NCBI, or building a phylogenetic tree — all within Python without manual database navigation.

When to use it

Reach for it when:

  • You need to batch-process sequence files (FASTA, GenBank) or convert between formats
  • You’re automating NCBI queries — fetching sequences, publication metadata, or taxonomy records via Bio.Entrez without manual downloads
  • You’re running BLAST remotely or locally via Python and want to parse the results programmatically

When not to reach for it:

  • Quick single-gene lookups — use gget for interactive exploration
  • Multi-service database integration from a single call — use bioservices

Install

Copy the SKILL.md from K-Dense AI’s biopython folder into .claude/skills/biopython/ in your project.

Trigger phrases: “parse FASTA file”, “BLAST search with Python”, “read GenBank file”, “NCBI Entrez query”.

What a session looks like

A typical session has three phases:

  1. Task and format specification. Describe the biological task — parsing sequences, database retrieval, alignment, or phylogenetics. Claude identifies the appropriate Biopython module and sets up the parser or query.
  2. Code generation. Claude writes the Biopython code with proper resource handling (always Entrez.email set, handles for file parsers closed after iteration), error handling for network timeouts, and rate-limit compliance for NCBI queries.
  3. Output processing. Parsed records are converted to the needed format — a pandas DataFrame, output FASTA, or BLAST hit table — with relevant fields extracted from the complex Biopython object hierarchy.

Receipts

Where it works well:

  • Batch NCBI downloads using Bio.Entrez.efetch with a list of accessions — Biopython handles the chunked requests and XML parsing that makes this painful to implement manually
  • Converting between sequence formats at scale — GenBank to FASTA, FASTQ to FASTA — with SeqIO.convert() in a single command

Where it backfires:

  • NCBI Entrez rate limits (3 requests/second without an API key, 10 with) cause failures in tight loops without explicit throttling; Biopython doesn’t add delays automatically
  • The Biopython object model for GenBank records is deep and nested; extracting features (CDS, exons, annotations) requires navigating several levels that Claude navigates correctly but beginner users find opaque

Pattern that works: always set Entrez.email before any NCBI query — NCBI blocks requests without a contact email, and Biopython will raise an error that looks like a network issue rather than a missing field.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the biopython folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.