scientific-critical-thinking

Evaluate scientific claims and evidence quality by assessing experimental design validity, identifying biases and confounders, and applying evidence grading frameworks (GRADE, Cochrane Risk of Bias) for evidence quality assessment and flaw identification.

Grade evidence quality and identify experimental design flaws

Source K-Dense AI

License MIT

First documented 2026-04-28

Science Scientific Writing

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

evaluate this study design
what are the confounders here
GRADE this evidence
assess bias risk
critique this experiment

What it does

scientific-critical-thinking is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into an evidence evaluator that applies structured frameworks — GRADE, Cochrane Risk of Bias, and study design hierarchies — to assess the quality of scientific claims and identify threats to validity: confounders, selection bias, measurement error, missing controls, and inappropriate statistical approaches.

A session produces a structured critique: evidence quality rating, identified flaws with explanations, confounders and limitations not discussed in the paper, and a conclusion about what the evidence does and does not support.

When to use it

Reach for it when:

You’re deciding whether to act on a study’s findings and need a systematic quality assessment
You’re teaching critical appraisal and want a worked example of applying a bias framework to a specific paper
You’re writing the limitations section of your own manuscript and want to make sure you’ve named the real threats to validity

When not to reach for it:

Writing the formal text of a peer review — use peer-review
Scoring manuscripts with a quantitative rubric — use scholar-evaluation

Install

Copy the SKILL.md from K-Dense AI’s scientific-critical-thinking folder into .claude/skills/scientific-critical-thinking/ in your project.

Trigger phrases: “evaluate this study design”, “what are the confounders here”, “GRADE this evidence”, “assess bias risk”, “critique this experiment”.

What a session looks like

A typical session has three phases:

Study intake. Paste the paper or describe the study design: population, intervention, comparison, outcome, and study type. Claude identifies which bias framework applies — Cochrane RoB for RCTs, ROBINS-I for observational studies, QUADAS-2 for diagnostic tests.
Structured appraisal. Claude works through each domain of the applicable framework, citing specific passages from the paper to support each rating. Confounders and design limitations not mentioned by the authors are surfaced explicitly.
Evidence summary. The GRADE level (high/moderate/low/very low) and a plain-language summary of what the evidence does and does not support are produced — suitable for a clinical decision summary or a methods section.

Receipts

Where it works well:

RCT appraisal using Cochrane RoB — domain-by-domain evaluation is thorough and catches omissions in randomization and blinding reporting that casual reading misses
Teaching contexts: the worked framework application is more instructive than a summary critique

Where it backfires:

Highly technical statistical flaws (e.g., model misspecification, exchangeability assumptions in causal models) require domain expertise to flag; Claude catches obvious statistical issues but not all subtle ones
The skill evaluates the study as reported — it cannot assess unreported data or publication bias within a single paper

Pattern that works: use this skill on the papers you’re citing in your own work, not just papers you’re evaluating for others; it surfaces limitations you should proactively address in your manuscript.

Source and attribution

Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the scientific-critical-thinking folder of their public scientific-agent-skills repository.

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.