# shap

> Model interpretability and explainability using SHAP (SHapley Additive exPlanations) — use for explaining ML model predictions, computing feature importance, generating SHAP plots, debugging models, analyzing bias, and implementing explainable AI.

**Use case**: Explain any ML model's predictions with SHAP values

**Canonical URL**: https://agentcookbooks.com/skills/shap/

**Topics**: claude-code, skills, science, ml-libraries

**Trigger phrases**: "explain this model", "feature importance with SHAP", "SHAP waterfall plot", "why did the model predict", "model interpretability"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/shap)

**License**: MIT

---

## What it does

`shap` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into a SHAP explainability expert that computes SHapley Additive exPlanations for any model — tree-based (XGBoost, LightGBM, random forest with TreeExplainer), deep learning (PyTorch, TensorFlow with DeepExplainer or GradientExplainer), linear models (LinearExplainer), and black-box models (KernelExplainer) — and generates the full suite of SHAP visualizations.

A session produces SHAP values for your model's predictions and the relevant plots: waterfall plots for individual predictions, beeswarm plots for global feature importance, force plots for interactive explanation, scatter plots for feature interactions, and heatmaps for sample-level explanation patterns.

## When to use it

Reach for it when:

- You need to explain an individual prediction to a stakeholder or in a regulatory context
- You want global feature importance that's more reliable than impurity-based feature importance from random forests
- You're debugging a model that's behaving unexpectedly and need to understand which features are driving specific predictions

When *not* to reach for it:

- You only need rough feature rankings and are using a tree model — built-in `feature_importances_` is faster
- Real-time inference contexts where SHAP computation latency is prohibitive

## Install

Copy the `SKILL.md` from K-Dense AI's [shap folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/shap) into `.claude/skills/shap/` in your project.

Trigger phrases: "explain this model", "feature importance with SHAP", "SHAP waterfall plot", "why did the model predict", "model interpretability".

## What a session looks like

A typical session has three phases:

1. **Explainer selection.** Claude identifies the model type and selects the fastest appropriate explainer — `TreeExplainer` for tree-based models (exact SHAP values, fast), `DeepExplainer` for neural networks, `KernelExplainer` for black-box models (slow, use sampling).
2. **SHAP value computation.** The explainer runs on the background dataset and optionally on specific instances. Claude handles the background dataset selection (k-means summarization for large datasets) and masking for text/image inputs.
3. **Visualization.** The requested plots are generated and saved. Claude interprets the top features and flags any SHAP values that suggest the model is using spurious correlations.

## Receipts

**Where it works well:**
- Tree model explanations with TreeExplainer — exact SHAP values computed in seconds even for large forests, beeswarm plots give a reliable global feature ranking
- Individual prediction explanations for clinical or regulatory contexts — waterfall plots with feature contributions in natural units are interpretable to non-ML audiences

**Where it backfires:**
- KernelExplainer on complex black-box models is extremely slow without aggressive sampling; the approximation quality depends on the background dataset size
- SHAP values for correlated features split contributions across correlated predictors in ways that are mathematically correct but counter-intuitive for domain experts

**Pattern that works:** always use a background dataset that's representative of your training distribution (not random subsets of test data); SHAP baseline values depend on the background and spurious baselines produce misleading explanations.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`shap` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/shap) of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.