# modal

> Cloud computing platform for running Python on GPUs and serverless infrastructure — use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud.

**Use case**: Deploy Python code and ML models to serverless cloud GPUs

**Canonical URL**: https://agentcookbooks.com/skills/modal/

**Topics**: claude-code, skills, science, science

**Trigger phrases**: "run this on Modal", "deploy to GPU cloud", "serverless inference", "Modal app", "run on H100"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/modal)

**License**: Apache-2.0

---

## What it does

`modal` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into a Modal platform expert for deploying Python workloads to serverless cloud infrastructure — running GPU-accelerated ML inference (H100, A100, T4), serving FastAPI endpoints that scale to zero, scheduling batch jobs, and scaling compute-intensive Python functions without managing cloud infrastructure.

A session produces a Modal app definition: a Python file decorated with Modal's `@app.function()` decorator, GPU type specification, container image configuration, and the deployment command — ready to run with `modal run` or `modal deploy`.

## When to use it

Reach for it when:

- You have a Python ML model or compute-intensive function that needs GPU resources your local machine doesn't have
- You want to serve an ML inference API with automatic scaling and pay-per-use pricing without managing Kubernetes or EC2
- You're running batch processing jobs (embedding generation, image processing, data transformation) that need more compute than your workstation

When *not* to reach for it:

- Always-on services with constant traffic where serverless cold starts are a problem — Modal's pay-per-use model optimizes for bursty workloads
- Workflows requiring persistent local filesystem state between function calls — Modal functions run in ephemeral containers

## Install

Copy the `SKILL.md` from K-Dense AI's [modal folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/modal) into `.claude/skills/modal/` in your project. Install via `pip install modal` and authenticate with `modal setup`. A Modal account is required.

Trigger phrases: "run this on Modal", "deploy to GPU cloud", "serverless inference", "Modal app", "run on H100".

## What a session looks like

A typical session has three phases:

1. **Function and infrastructure specification.** Describe the Python function to deploy, the GPU type needed (H100 for large models, T4 for smaller workloads), and the container requirements (base image, pip packages, model files). Claude writes the Modal app definition.
2. **Container image setup.** Claude constructs the `modal.Image` definition with the correct base image, package installations, and any model weights to bake into the image for fast cold starts.
3. **Deployment code.** The complete Modal app is written with the function decorated for GPU, memory, and timeout specifications, plus either a `modal run` script for batch use or a `@app.web_endpoint()` for HTTP serving.

## Receipts

**Where it works well:**
- Running large ML model inference (Llama, Stable Diffusion, Whisper) on H100s without owning the hardware — Modal's cold start time is fast enough for batch workloads and acceptable for non-latency-critical APIs
- Batch embedding generation across large document collections — parallelizing with `map()` across a list of inputs distributes work across many GPU containers simultaneously

**Where it backfires:**
- Cold starts add latency (5–30 seconds for large container images with big model weights) — not suitable for real-time serving where first-request latency matters
- Complex multi-step pipelines with stateful intermediate results need careful design around Modal's ephemeral container model

**Pattern that works:** bake model weights into the container image using `image.run_commands()` or download them during image build rather than at function runtime — it eliminates model download time from every cold start.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`modal` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/modal) of their public scientific-agent-skills repository.

License: Apache-2.0. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.