modal
Cloud computing platform for running Python on GPUs and serverless infrastructure — use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud.
Deploy Python code and ML models to serverless cloud GPUs
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
run this on Modaldeploy to GPU cloudserverless inferenceModal apprun on H100
What it does
modal is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Modal platform expert for deploying Python workloads to serverless cloud infrastructure — running GPU-accelerated ML inference (H100, A100, T4), serving FastAPI endpoints that scale to zero, scheduling batch jobs, and scaling compute-intensive Python functions without managing cloud infrastructure.
A session produces a Modal app definition: a Python file decorated with Modal’s @app.function() decorator, GPU type specification, container image configuration, and the deployment command — ready to run with modal run or modal deploy.
When to use it
Reach for it when:
- You have a Python ML model or compute-intensive function that needs GPU resources your local machine doesn’t have
- You want to serve an ML inference API with automatic scaling and pay-per-use pricing without managing Kubernetes or EC2
- You’re running batch processing jobs (embedding generation, image processing, data transformation) that need more compute than your workstation
When not to reach for it:
- Always-on services with constant traffic where serverless cold starts are a problem — Modal’s pay-per-use model optimizes for bursty workloads
- Workflows requiring persistent local filesystem state between function calls — Modal functions run in ephemeral containers
Install
Copy the SKILL.md from K-Dense AI’s modal folder into .claude/skills/modal/ in your project. Install via pip install modal and authenticate with modal setup. A Modal account is required.
Trigger phrases: “run this on Modal”, “deploy to GPU cloud”, “serverless inference”, “Modal app”, “run on H100”.
What a session looks like
A typical session has three phases:
- Function and infrastructure specification. Describe the Python function to deploy, the GPU type needed (H100 for large models, T4 for smaller workloads), and the container requirements (base image, pip packages, model files). Claude writes the Modal app definition.
- Container image setup. Claude constructs the
modal.Imagedefinition with the correct base image, package installations, and any model weights to bake into the image for fast cold starts. - Deployment code. The complete Modal app is written with the function decorated for GPU, memory, and timeout specifications, plus either a
modal runscript for batch use or a@app.web_endpoint()for HTTP serving.
Receipts
Where it works well:
- Running large ML model inference (Llama, Stable Diffusion, Whisper) on H100s without owning the hardware — Modal’s cold start time is fast enough for batch workloads and acceptable for non-latency-critical APIs
- Batch embedding generation across large document collections — parallelizing with
map()across a list of inputs distributes work across many GPU containers simultaneously
Where it backfires:
- Cold starts add latency (5–30 seconds for large container images with big model weights) — not suitable for real-time serving where first-request latency matters
- Complex multi-step pipelines with stateful intermediate results need careful design around Modal’s ephemeral container model
Pattern that works: bake model weights into the container image using image.run_commands() or download them during image build rather than at function runtime — it eliminates model download time from every cold start.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the modal folder of their public scientific-agent-skills repository.
License: Apache-2.0. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.