polars
Fast in-memory DataFrame library for datasets that fit in RAM — use when pandas is too slow but data still fits in memory, with lazy evaluation, parallel execution, and an Apache Arrow backend for 1–100GB datasets and ETL pipelines.
Fast pandas replacement with lazy evaluation for large DataFrames
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
use polars instead of pandaspolars DataFramelazy evaluation pipelinefast CSV loadingpolars groupby
What it does
polars is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into a Polars expert covering the full DataFrame API — lazy vs. eager evaluation, expression syntax, groupby and aggregation, joins, string operations, time series, and integration with Apache Arrow and Parquet — for workflows where pandas becomes the bottleneck.
A session produces Polars code that is idiomatic to Polars’ expression syntax rather than a naive pandas-to-polars translation, making full use of lazy evaluation and parallel execution.
When to use it
Reach for it when:
- Your pandas operations are slow and your data fits in RAM (roughly 1–100 GB)
- You’re building ETL pipelines where lazy evaluation lets you compose transformations before executing them
- You need to process large CSV or Parquet files faster than pandas can manage
When not to reach for it:
- Data that doesn’t fit in RAM — use
daskfor distributed or out-of-core processing - Workflows tightly coupled to pandas-only libraries — some ML libraries don’t accept Polars DataFrames directly
Install
Copy the SKILL.md from K-Dense AI’s polars folder into .claude/skills/polars/ in your project.
Trigger phrases: “use polars instead of pandas”, “polars DataFrame”, “lazy evaluation pipeline”, “fast CSV loading”.
What a session looks like
A typical session has three phases:
- Context and operation description. Describe the data shape and the transformation you need — filter, groupby, join, or pipeline. Claude identifies whether lazy (LazyFrame) or eager (DataFrame) evaluation is more appropriate.
- Idiomatic Polars code. Claude writes code using Polars’ expression API (
pl.col(),pl.lit(), method chaining) rather than row-iteration patterns, making full use of parallelism. - Performance notes. Claude points out where the code exploits Polars’ query optimizer and flags any anti-patterns (e.g., using
.to_pandas()unnecessarily) that would undo the performance gains.
Receipts
Where it works well:
- Reading and filtering large CSV or Parquet files — Polars’ lazy scan with predicate pushdown is dramatically faster than pandas read_csv for large files with filter conditions
- Groupby aggregations on high-cardinality columns — parallel execution makes Polars substantially faster than pandas for groupby on columns with millions of unique values
Where it backfires:
- Libraries that only accept pandas DataFrames as input require a
.to_pandas()call that converts back, negating some of the performance gains for those specific operations - Polars’ expression syntax has a learning curve; Claude’s first-pass code is idiomatic, but debugging novel expressions requires understanding the expression context model
Pattern that works: start with a LazyFrame scan (.scan_csv(), .scan_parquet()) and build the full transformation chain before calling .collect() — letting the query optimizer run over the complete plan produces better performance than collecting at intermediate steps.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the polars folder of their public scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.