foundation-models-on-device

A Claude Code skill from Affaan M's everything-claude-code repo for Apple's FoundationModels framework — on-device LLM in iOS 26+. Covers availability checks (deviceNotEligible / appleIntelligenceNotEnabled / modelNotReady), basic and multi-turn sessions, guided generation with @Generable / @Guide, custom tool calling, and snapshot streaming for real-time UI updates.

Build on-device LLM features (text gen, structured output, tool calling) with FoundationModels instead of cloud APIs

Source Affaan M

License MIT

First documented 2026-05-12

Receipts TODO

iOS

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

use Apple Intelligence on-device LLM
structured output with @Generable
custom tool calling in FoundationModels

What it does

foundation-models-on-device is the Apple Intelligence skill in Affaan M’s everything-claude-code — see skills/foundation-models-on-device. It covers Apple’s FoundationModels framework for on-device LLM in iOS 26+ — text generation, structured output with @Generable, custom tool calling, and snapshot streaming, all running on-device with no cloud dependency.

The availability gate is the first pattern: SystemLanguageModel.default.availability returns .available, .unavailable(.deviceNotEligible), .unavailable(.appleIntelligenceNotEnabled), or .unavailable(.modelNotReady). The skill is explicit that every entry point must check availability before creating a LanguageModelSession — silently failing in unavailable states is the most common bug.

Sessions are either single-turn (create a new session per call) or multi-turn (reuse for conversation context). Instructions go on session init and cover role, task, style, and safety guards (“Respond with ‘I can’t help with that’ for dangerous requests”). Guided generation is the structured-output pattern: @Generable on a struct, @Guide on each property with optional constraints (.range(0...20) for numeric, .count(3) for arrays, description: for semantic guidance). try await session.respond(to: prompt, generating: CatProfile.self) returns the typed struct directly. Custom Tool types with Generable Arguments let the model invoke domain-specific code — recipe search, calendar event creation — without leaving on-device.

When to use it

Building AI-powered features with Apple Intelligence on-device — privacy and offline support are the wedge
Generating or summarizing text without cloud dependency or per-token cost
Extracting structured data from natural language input via @Generable
Implementing custom tool calling for domain-specific AI actions where the tool runs on-device
Streaming structured responses for real-time UI updates (snapshot streaming)
Privacy-sensitive workflows where data leaving the device is a non-starter

When not to reach for it:

Pre-iOS-26 deployment targets — the framework is gated
Devices without Apple Intelligence eligibility — the availability check will refuse
Frontier-model reasoning tasks — on-device models are smaller than cloud frontier models, and the skill is honest about that limit
Cross-platform LLM integration — this is Apple-specific

Install

From affaan-m/everything-claude-code at skills/foundation-models-on-device/. Drop the folder into ~/.claude/skills/foundation-models-on-device/. The skill is markdown + Swift code patterns; the runtime is Xcode 26+ targeting iOS 26+, on an Apple Intelligence-eligible device. The framework itself ships with iOS — no SDK install — but the availability check is mandatory because not every iOS 26 device qualifies.

What a session looks like

Add the availability check. Switch over SystemLanguageModel.default.availability — render the content for .available and a clear status message for each unavailable case (device not eligible / Apple Intelligence off in settings / model still downloading / other).
Create the session. Single-turn for one-shot calls (LanguageModelSession()), multi-turn with instructions: for conversational features. Instructions cover role + task + style + safety.
Define the @Generable type. Struct with @Guide on each property. Constraints where the schema benefits — .range(0...20) for ages, description: strings that guide the generation semantically.
Request structured output. try await session.respond(to: prompt, generating: CatProfile.self) returns a Response<CatProfile> with .content typed as the struct. No JSON parsing.
Add tool calling if needed. Define a Tool struct with name, description, @Generable Arguments, and a call(_:) method. The model decides when to invoke the tool; the on-device runtime handles the round-trip.
Stream if the UI needs it. Snapshot streaming for real-time updates — partial structured outputs surface as the model generates them.

The discipline that makes it work: structured-output-first. Using @Generable for anything beyond pure text generation avoids the brittle JSON-parsing layer that plagues cloud-LLM apps — the framework guarantees the response is a valid CatProfile or it throws.

Receipts

TODO — to be filled in from a real session. Once the framework has been used in a real iOS 26 app, this section will capture: which availability state actually fired most often on real devices (the upstream just shows the switch — receipts will show which devices land in .deviceNotEligible and which in .appleIntelligenceNotEnabled), how the on-device model handled a non-trivial @Generable schema with multiple @Guide constraints, whether the snapshot streaming UX kept pace with model generation speed, and the actual per-request latency on a real device for a structured-output call.

Source and attribution

From Affaan M’s everything-claude-code — an MIT-licensed skill collection covering harness construction, agent ops, video, payments, and platform-specific patterns.

License: MIT.

Quoting the availability-check rule verbatim: “Always check model availability before creating a session.” That’s the wedge — every other pattern in the skill assumes the device can run the model; the availability check is what makes the rest safe to write.