scikit-learn

Machine learning in Python — use for supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, and building ML pipelines with comprehensive algorithm reference.

Build and evaluate machine learning pipelines in Python

Source K-Dense AI
License MIT
First documented

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

  • train a classifier
  • scikit-learn pipeline
  • cross-validation
  • hyperparameter tuning
  • fit this ML model

What it does

scikit-learn is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a scikit-learn expert covering the full supervised and unsupervised ML toolkit — classification (random forest, SVM, gradient boosting, logistic regression), regression, clustering (k-means, DBSCAN, hierarchical), dimensionality reduction (PCA, t-SNE, UMAP), preprocessing pipelines, cross-validation, and hyperparameter search (GridSearchCV, RandomizedSearchCV, Optuna integration).

A session produces complete, runnable ML code: a Pipeline object that chains preprocessing and the model, cross-validation evaluation with the appropriate metrics, and either the best model artifact or a hyperparameter search setup.

When to use it

Reach for it when:

  • You need a standard ML model (not deep learning) trained on tabular data with reliable performance baselines
  • You’re building a preprocessing + model pipeline that needs to be reproducible and easy to serialize
  • You want cross-validated performance metrics and feature importances, not just a single train/test split

When not to reach for it:

  • Deep learning on images, text, or sequences — use transformers or pytorch-lightning
  • Graph-structured data — use torch-geometric
  • Model explainability after fitting — combine with shap

Install

Copy the SKILL.md from K-Dense AI’s scikit-learn folder into .claude/skills/scikit-learn/ in your project.

Trigger phrases: “train a classifier”, “scikit-learn pipeline”, “cross-validation”, “hyperparameter tuning”, “fit this ML model”.

What a session looks like

A typical session has three phases:

  1. Task and data description. Specify the ML task (binary classification, multi-class, regression), describe the feature types (numerical, categorical, text, mixed), and indicate any class imbalance or missing data concerns.
  2. Pipeline construction. Claude writes a sklearn.Pipeline with appropriate preprocessing steps (imputation, scaling, encoding) followed by the model. A cross-validation loop with the right metric (AUC-ROC, F1, RMSE) is set up.
  3. Evaluation and interpretation. Cross-validated metrics are computed, a confusion matrix or regression residuals are plotted, and feature importances are extracted where the model supports them. Claude flags if the results suggest overfitting or class imbalance problems.

Receipts

Where it works well:

  • Tabular classification problems where gradient boosting (HistGradientBoosting, XGBoost via the sklearn wrapper) consistently produces strong baselines — Claude’s pipeline code handles missing values and mixed types cleanly
  • Preprocessing pipelines that need to generalize from training to test data without leakage — the Pipeline object enforces this correctly

Where it backfires:

  • Very large datasets where scikit-learn’s in-memory computation is slow — Dask-ML provides distributed wrappers for some estimators
  • Custom loss functions or architectures are not supported in most scikit-learn estimators without significant workarounds

Pattern that works: fit a dummy classifier (predicting majority class) as your baseline before any real model — it takes one line and immediately tells you if the problem has a trivial baseline you need to beat.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the scikit-learn folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.