transformers

Pre-trained transformer models for NLP, computer vision, audio, and multimodal tasks — use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning on custom datasets.

Load and fine-tune Hugging Face transformer models

Source K-Dense AI
License Apache-2.0
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • use a Hugging Face model
  • fine-tune this model
  • text classification with transformers
  • load a pretrained model
  • sentiment analysis

What it does

transformers is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Hugging Face Transformers expert covering model loading via AutoModel/pipeline, tokenization, inference, and fine-tuning with the Trainer API — across NLP (generation, classification, QA, translation, summarization), vision (image classification, object detection, segmentation), audio (ASR, audio classification), and multimodal tasks.

A session produces complete Python code: model loading, tokenization, inference pipeline, or a full fine-tuning setup with the Trainer configured for your task and hardware.

When to use it

Reach for it when:

  • You need to run inference with a pretrained model from the Hugging Face Hub on text, images, or audio
  • You’re fine-tuning a pretrained model on a custom dataset for a specific classification or generation task
  • You need multimodal inference (vision-language models, audio-text models) from a single unified library

When not to reach for it:

  • Organized training loops with multi-GPU, logging callbacks, and experiment tracking — use pytorch-lightning which wraps Transformers cleanly
  • Graph neural networks on structured data — use torch-geometric

Install

Copy the SKILL.md from K-Dense AI’s transformers folder into .claude/skills/transformers/ in your project. A Hugging Face token (HF_TOKEN environment variable) is required for gated models.

Trigger phrases: “use a Hugging Face model”, “fine-tune this model”, “text classification with transformers”, “load a pretrained model”.

What a session looks like

A typical session has three phases:

  1. Task and model selection. Describe the task and data domain. Claude selects an appropriate pretrained model from the Hub (specifying the model card), the correct tokenizer, and the appropriate AutoModel class for the task head.
  2. Inference or fine-tuning setup. For inference, Claude writes a pipeline() call or explicit tokenize-forward-decode loop. For fine-tuning, Claude configures a Trainer with the dataset, training arguments (learning rate, batch size, number of epochs), and evaluation metrics.
  3. Hardware and optimization. Claude adds device placement (.to("cuda")), half-precision inference (torch.float16), and batching setup appropriate to the available hardware.

Receipts

Where it works well:

  • Zero-shot classification and named entity recognition via pipeline() — the abstraction is clean and the pretrained models perform surprisingly well out of the box on common tasks
  • Fine-tuning BERT-family models on text classification — the Trainer API handles the training loop, evaluation, and checkpoint saving reliably

Where it backfires:

  • Large model inference without quantization exhausts GPU memory quickly; Claude doesn’t always proactively recommend bitsandbytes quantization for 7B+ models
  • Some gated models on the Hub require manual license acceptance through the web UI before the token grants download access — a workflow friction point that surprises first-time users

Pattern that works: start with the pipeline() API to verify the model works for your task before writing custom tokenization and model code — it’s much faster to prototype with and easier to debug.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the transformers folder of their public scientific-agent-skills repository.

License: Apache-2.0. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.