Claude Code skills tagged Evals
Skills for evaluating LLM outputs — benchmark design, regression testing across model versions, eval-harness construction, AI-specific quality gates.
3 skills
-
agent-eval
Replace 'which coding agent feels better' with pass-rate + cost + time + consistency on your own codebase
-
ai-regression-testing
Catch AI-introduced regressions mechanically before the same model reviews its own work
-
benchmark
Measure performance baselines and detect regressions before / after a PR with concrete numbers
Browse other topics
Science 73 Marketing 64 SEO 28 Scientific Writing 21 Planning 13 Data Science 12 Agents 11 Documentation 11 Code Quality 10 Bioinformatics 8 Design 7 CRO 6 GTM 6 ML Libraries 6 Content 5 Context Engineering 5 Growth 5 iOS 5 Animation 4 Architecture 4 Cheminformatics 4 Code Review 4 Copywriting 4 Mobile 4 Security 4 Skill Development 4 Strategy 4 Clinical 3 Cost Management 3 Debugging 3 Email 3 Git 3 MCP 3 Performance 3 Positioning 3 React 3 Refactoring 3 Crypto 2 Hooks 2 Product Strategy 2 Quantum Computing 2 Agent Native 1 Codebase Analysis 1 Deployment 1 Devenv 1 Humanizing 1 Knowledge Codification 1 Knowledge Graphs 1 Rails 1 Vercel 1