modal

Cloud computing platform for running Python on GPUs and serverless infrastructure — use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud.

Deploy Python code and ML models to serverless cloud GPUs

Source K-Dense AI

License Apache-2.0

First documented 2026-04-28

Science Science

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

run this on Modal
deploy to GPU cloud
serverless inference
Modal app
run on H100

What it does

modal is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Modal platform expert for deploying Python workloads to serverless cloud infrastructure — running GPU-accelerated ML inference (H100, A100, T4), serving FastAPI endpoints that scale to zero, scheduling batch jobs, and scaling compute-intensive Python functions without managing cloud infrastructure.

A session produces a Modal app definition: a Python file decorated with Modal’s @app.function() decorator, GPU type specification, container image configuration, and the deployment command — ready to run with modal run or modal deploy.

When to use it

Reach for it when:

You have a Python ML model or compute-intensive function that needs GPU resources your local machine doesn’t have
You want to serve an ML inference API with automatic scaling and pay-per-use pricing without managing Kubernetes or EC2
You’re running batch processing jobs (embedding generation, image processing, data transformation) that need more compute than your workstation

When not to reach for it:

Always-on services with constant traffic where serverless cold starts are a problem — Modal’s pay-per-use model optimizes for bursty workloads
Workflows requiring persistent local filesystem state between function calls — Modal functions run in ephemeral containers

Install

Copy the SKILL.md from K-Dense AI’s modal folder into .claude/skills/modal/ in your project. Install via pip install modal and authenticate with modal setup. A Modal account is required.

Trigger phrases: “run this on Modal”, “deploy to GPU cloud”, “serverless inference”, “Modal app”, “run on H100”.

What a session looks like

A typical session has three phases:

Function and infrastructure specification. Describe the Python function to deploy, the GPU type needed (H100 for large models, T4 for smaller workloads), and the container requirements (base image, pip packages, model files). Claude writes the Modal app definition.
Container image setup. Claude constructs the modal.Image definition with the correct base image, package installations, and any model weights to bake into the image for fast cold starts.
Deployment code. The complete Modal app is written with the function decorated for GPU, memory, and timeout specifications, plus either a modal run script for batch use or a @app.web_endpoint() for HTTP serving.

Receipts

Where it works well:

Running large ML model inference (Llama, Stable Diffusion, Whisper) on H100s without owning the hardware — Modal’s cold start time is fast enough for batch workloads and acceptable for non-latency-critical APIs
Batch embedding generation across large document collections — parallelizing with map() across a list of inputs distributes work across many GPU containers simultaneously

Where it backfires:

Cold starts add latency (5–30 seconds for large container images with big model weights) — not suitable for real-time serving where first-request latency matters
Complex multi-step pipelines with stateful intermediate results need careful design around Modal’s ephemeral container model

Pattern that works: bake model weights into the container image using image.run_commands() or download them during image build rather than at function runtime — it eliminates model download time from every cold start.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the modal folder of their public scientific-agent-skills repository.

License: Apache-2.0. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.