statsmodels
Statistical models library for Python — use when you need specific model classes (OLS, GLM, mixed models, ARIMA) with detailed diagnostics, residuals, and inference tables for econometrics, time series, and rigorous statistical inference.
Run statistical models with full diagnostics and inference tables
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
run OLS regressionfit a GLMmixed effects modelARIMA forecaststatsmodels regression
What it does
statsmodels is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a statsmodels expert covering the full range of model classes — OLS, WLS, GLS, GLM (logistic, Poisson, negative binomial), mixed linear models, ARIMA, VAR, GARCH, and nonparametric methods — with emphasis on the diagnostic output: coefficient tables, residual plots, heteroskedasticity tests, and information criteria for model comparison.
A session produces Python code that fits the model, prints the summary table, and generates diagnostic plots — the complete inference workflow from data to publication-ready coefficient estimates.
When to use it
Reach for it when:
- You need a full coefficient table with standard errors, t-statistics, p-values, and confidence intervals — not just a prediction
- You’re doing time series analysis with ARIMA, SARIMA, or VAR and need model diagnostics (ACF/PACF, Ljung-Box test)
- You need formal statistical inference with model diagnostics: Durbin-Watson, Breusch-Pagan, or VIF for multicollinearity
When not to reach for it:
- You need guided test selection and APA-formatted reporting — use
statistical-analysis - Bayesian models with MCMC — use
pymc - Predictive machine learning pipelines where inference tables aren’t the goal — use
scikit-learn
Install
Copy the SKILL.md from K-Dense AI’s statsmodels folder into .claude/skills/statsmodels/ in your project.
Trigger phrases: “run OLS regression”, “fit a GLM”, “mixed effects model”, “ARIMA forecast”.
What a session looks like
A typical session has three phases:
- Model specification. Describe the dependent variable, predictors, and model family. Claude identifies whether to use the formula API (
smf.ols("y ~ x1 + x2", data=df)) or the array API, and whether robust standard errors are appropriate. - Model fitting and diagnostics. Claude writes code to fit the model, print the summary, and run assumption checks — normality of residuals, homoskedasticity, autocorrelation — with interpretation of each diagnostic.
- Reporting output. The coefficient table is formatted for inclusion in a manuscript (with significance stars, standard errors in parentheses) or exported via
stargazer/tabulatefor LaTeX.
Receipts
Where it works well:
- OLS with robust standard errors for cross-sectional econometric data — the diagnostic suite catches the common violations (heteroskedasticity, outliers, multicollinearity) that need reporting in papers
- ARIMA specification for stationary time series — ACF/PACF interpretation guidance and automatic order selection via AIC
Where it backfires:
- Very large datasets where statsmodels’ full in-memory computation becomes slow — for big-data regression, consider distributed alternatives
- Mixed effects models with complex random effects structures can be finicky to specify; convergence warnings require human interpretation
Pattern that works: always look at the diagnostic plots before trusting the coefficient table; statistically significant coefficients from a model with violated assumptions are not credible estimates.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the statsmodels folder of their public scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.