# statsmodels

> Statistical models library for Python — use when you need specific model classes (OLS, GLM, mixed models, ARIMA) with detailed diagnostics, residuals, and inference tables for econometrics, time series, and rigorous statistical inference.

**Use case**: Run statistical models with full diagnostics and inference tables

**Canonical URL**: https://agentcookbooks.com/skills/statsmodels/

**Topics**: claude-code, skills, science, data-science

**Trigger phrases**: "run OLS regression", "fit a GLM", "mixed effects model", "ARIMA forecast", "statsmodels regression"

**Source**: [K-Dense AI](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/statsmodels)

**License**: MIT

---

## What it does

`statsmodels` is a Claude Code skill from K-Dense AI's [scientific-agent-skills repo](https://github.com/K-Dense-AI/scientific-agent-skills). It turns Claude into a statsmodels expert covering the full range of model classes — OLS, WLS, GLS, GLM (logistic, Poisson, negative binomial), mixed linear models, ARIMA, VAR, GARCH, and nonparametric methods — with emphasis on the diagnostic output: coefficient tables, residual plots, heteroskedasticity tests, and information criteria for model comparison.

A session produces Python code that fits the model, prints the summary table, and generates diagnostic plots — the complete inference workflow from data to publication-ready coefficient estimates.

## When to use it

Reach for it when:

- You need a full coefficient table with standard errors, t-statistics, p-values, and confidence intervals — not just a prediction
- You're doing time series analysis with ARIMA, SARIMA, or VAR and need model diagnostics (ACF/PACF, Ljung-Box test)
- You need formal statistical inference with model diagnostics: Durbin-Watson, Breusch-Pagan, or VIF for multicollinearity

When *not* to reach for it:

- You need guided test selection and APA-formatted reporting — use `statistical-analysis`
- Bayesian models with MCMC — use `pymc`
- Predictive machine learning pipelines where inference tables aren't the goal — use `scikit-learn`

## Install

Copy the `SKILL.md` from K-Dense AI's [statsmodels folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/statsmodels) into `.claude/skills/statsmodels/` in your project.

Trigger phrases: "run OLS regression", "fit a GLM", "mixed effects model", "ARIMA forecast".

## What a session looks like

A typical session has three phases:

1. **Model specification.** Describe the dependent variable, predictors, and model family. Claude identifies whether to use the formula API (`smf.ols("y ~ x1 + x2", data=df)`) or the array API, and whether robust standard errors are appropriate.
2. **Model fitting and diagnostics.** Claude writes code to fit the model, print the summary, and run assumption checks — normality of residuals, homoskedasticity, autocorrelation — with interpretation of each diagnostic.
3. **Reporting output.** The coefficient table is formatted for inclusion in a manuscript (with significance stars, standard errors in parentheses) or exported via `stargazer`/`tabulate` for LaTeX.

## Receipts

**Where it works well:**
- OLS with robust standard errors for cross-sectional econometric data — the diagnostic suite catches the common violations (heteroskedasticity, outliers, multicollinearity) that need reporting in papers
- ARIMA specification for stationary time series — ACF/PACF interpretation guidance and automatic order selection via AIC

**Where it backfires:**
- Very large datasets where statsmodels' full in-memory computation becomes slow — for big-data regression, consider distributed alternatives
- Mixed effects models with complex random effects structures can be finicky to specify; convergence warnings require human interpretation

**Pattern that works:** always look at the diagnostic plots before trusting the coefficient table; statistically significant coefficients from a model with violated assumptions are not credible estimates.

## Source and attribution

Originally authored by [K-Dense Inc.](https://github.com/K-Dense-AI). The canonical SKILL.md lives in the [`statsmodels` folder](https://github.com/K-Dense-AI/scientific-agent-skills/tree/main/scientific-skills/statsmodels) of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and any updates, defer to the source repo.