polars

Fast in-memory DataFrame library for datasets that fit in RAM — use when pandas is too slow but data still fits in memory, with lazy evaluation, parallel execution, and an Apache Arrow backend for 1–100GB datasets and ETL pipelines.

Fast pandas replacement with lazy evaluation for large DataFrames

Source K-Dense AI

License MIT

First documented 2026-04-28

Receipts generic

Science Data Science

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

use polars instead of pandas
polars DataFrame
lazy evaluation pipeline
fast CSV loading
polars groupby

What it does

polars is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Polars expert covering the full DataFrame API — lazy vs. eager evaluation, expression syntax, groupby and aggregation, joins, string operations, time series, and integration with Apache Arrow and Parquet — for workflows where pandas becomes the bottleneck.

A session produces Polars code that is idiomatic to Polars’ expression syntax rather than a naive pandas-to-polars translation, making full use of lazy evaluation and parallel execution.

When to use it

Reach for it when:

Your pandas operations are slow and your data fits in RAM (roughly 1–100 GB)
You’re building ETL pipelines where lazy evaluation lets you compose transformations before executing them
You need to process large CSV or Parquet files faster than pandas can manage

When not to reach for it:

Data that doesn’t fit in RAM — use dask for distributed or out-of-core processing
Workflows tightly coupled to pandas-only libraries — some ML libraries don’t accept Polars DataFrames directly

Install

Copy the SKILL.md from K-Dense AI’s polars folder into .claude/skills/polars/ in your project.

Trigger phrases: “use polars instead of pandas”, “polars DataFrame”, “lazy evaluation pipeline”, “fast CSV loading”.

What a session looks like

A typical session has three phases:

Context and operation description. Describe the data shape and the transformation you need — filter, groupby, join, or pipeline. Claude identifies whether lazy (LazyFrame) or eager (DataFrame) evaluation is more appropriate.
Idiomatic Polars code. Claude writes code using Polars’ expression API (pl.col(), pl.lit(), method chaining) rather than row-iteration patterns, making full use of parallelism.
Performance notes. Claude points out where the code exploits Polars’ query optimizer and flags any anti-patterns (e.g., using .to_pandas() unnecessarily) that would undo the performance gains.

Receipts

Where it works well:

Reading and filtering large CSV or Parquet files — Polars’ lazy scan with predicate pushdown is dramatically faster than pandas read_csv for large files with filter conditions
Groupby aggregations on high-cardinality columns — parallel execution makes Polars substantially faster than pandas for groupby on columns with millions of unique values

Where it backfires:

Libraries that only accept pandas DataFrames as input require a .to_pandas() call that converts back, negating some of the performance gains for those specific operations
Polars’ expression syntax has a learning curve; Claude’s first-pass code is idiomatic, but debugging novel expressions requires understanding the expression context model

Pattern that works: start with a LazyFrame scan (.scan_csv(), .scan_parquet()) and build the full transformation chain before calling .collect() — letting the query optimizer run over the complete plan produces better performance than collecting at intermediate steps.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the polars folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.