statsmodels

Statistical models library for Python — use when you need specific model classes (OLS, GLM, mixed models, ARIMA) with detailed diagnostics, residuals, and inference tables for econometrics, time series, and rigorous statistical inference.

Run statistical models with full diagnostics and inference tables

Source K-Dense AI
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • run OLS regression
  • fit a GLM
  • mixed effects model
  • ARIMA forecast
  • statsmodels regression

What it does

statsmodels is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a statsmodels expert covering the full range of model classes — OLS, WLS, GLS, GLM (logistic, Poisson, negative binomial), mixed linear models, ARIMA, VAR, GARCH, and nonparametric methods — with emphasis on the diagnostic output: coefficient tables, residual plots, heteroskedasticity tests, and information criteria for model comparison.

A session produces Python code that fits the model, prints the summary table, and generates diagnostic plots — the complete inference workflow from data to publication-ready coefficient estimates.

When to use it

Reach for it when:

  • You need a full coefficient table with standard errors, t-statistics, p-values, and confidence intervals — not just a prediction
  • You’re doing time series analysis with ARIMA, SARIMA, or VAR and need model diagnostics (ACF/PACF, Ljung-Box test)
  • You need formal statistical inference with model diagnostics: Durbin-Watson, Breusch-Pagan, or VIF for multicollinearity

When not to reach for it:

  • You need guided test selection and APA-formatted reporting — use statistical-analysis
  • Bayesian models with MCMC — use pymc
  • Predictive machine learning pipelines where inference tables aren’t the goal — use scikit-learn

Install

Copy the SKILL.md from K-Dense AI’s statsmodels folder into .claude/skills/statsmodels/ in your project.

Trigger phrases: “run OLS regression”, “fit a GLM”, “mixed effects model”, “ARIMA forecast”.

What a session looks like

A typical session has three phases:

  1. Model specification. Describe the dependent variable, predictors, and model family. Claude identifies whether to use the formula API (smf.ols("y ~ x1 + x2", data=df)) or the array API, and whether robust standard errors are appropriate.
  2. Model fitting and diagnostics. Claude writes code to fit the model, print the summary, and run assumption checks — normality of residuals, homoskedasticity, autocorrelation — with interpretation of each diagnostic.
  3. Reporting output. The coefficient table is formatted for inclusion in a manuscript (with significance stars, standard errors in parentheses) or exported via stargazer/tabulate for LaTeX.

Receipts

Where it works well:

  • OLS with robust standard errors for cross-sectional econometric data — the diagnostic suite catches the common violations (heteroskedasticity, outliers, multicollinearity) that need reporting in papers
  • ARIMA specification for stationary time series — ACF/PACF interpretation guidance and automatic order selection via AIC

Where it backfires:

  • Very large datasets where statsmodels’ full in-memory computation becomes slow — for big-data regression, consider distributed alternatives
  • Mixed effects models with complex random effects structures can be finicky to specify; convergence warnings require human interpretation

Pattern that works: always look at the diagnostic plots before trusting the coefficient table; statistically significant coefficients from a model with violated assumptions are not credible estimates.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the statsmodels folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.