scientific-critical-thinking
Evaluate scientific claims and evidence quality by assessing experimental design validity, identifying biases and confounders, and applying evidence grading frameworks (GRADE, Cochrane Risk of Bias) for evidence quality assessment and flaw identification.
Grade evidence quality and identify experimental design flaws
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
evaluate this study designwhat are the confounders hereGRADE this evidenceassess bias riskcritique this experiment
What it does
scientific-critical-thinking is a Claude Code skill from K-Dense AI’s scientific-agent-skills repo. It turns Claude into an evidence evaluator that applies structured frameworks — GRADE, Cochrane Risk of Bias, and study design hierarchies — to assess the quality of scientific claims and identify threats to validity: confounders, selection bias, measurement error, missing controls, and inappropriate statistical approaches.
A session produces a structured critique: evidence quality rating, identified flaws with explanations, confounders and limitations not discussed in the paper, and a conclusion about what the evidence does and does not support.
When to use it
Reach for it when:
- You’re deciding whether to act on a study’s findings and need a systematic quality assessment
- You’re teaching critical appraisal and want a worked example of applying a bias framework to a specific paper
- You’re writing the limitations section of your own manuscript and want to make sure you’ve named the real threats to validity
When not to reach for it:
- Writing the formal text of a peer review — use
peer-review - Scoring manuscripts with a quantitative rubric — use
scholar-evaluation
Install
Copy the SKILL.md from K-Dense AI’s scientific-critical-thinking folder into .claude/skills/scientific-critical-thinking/ in your project.
Trigger phrases: “evaluate this study design”, “what are the confounders here”, “GRADE this evidence”, “assess bias risk”, “critique this experiment”.
What a session looks like
A typical session has three phases:
- Study intake. Paste the paper or describe the study design: population, intervention, comparison, outcome, and study type. Claude identifies which bias framework applies — Cochrane RoB for RCTs, ROBINS-I for observational studies, QUADAS-2 for diagnostic tests.
- Structured appraisal. Claude works through each domain of the applicable framework, citing specific passages from the paper to support each rating. Confounders and design limitations not mentioned by the authors are surfaced explicitly.
- Evidence summary. The GRADE level (high/moderate/low/very low) and a plain-language summary of what the evidence does and does not support are produced — suitable for a clinical decision summary or a methods section.
Receipts
Where it works well:
- RCT appraisal using Cochrane RoB — domain-by-domain evaluation is thorough and catches omissions in randomization and blinding reporting that casual reading misses
- Teaching contexts: the worked framework application is more instructive than a summary critique
Where it backfires:
- Highly technical statistical flaws (e.g., model misspecification, exchangeability assumptions in causal models) require domain expertise to flag; Claude catches obvious statistical issues but not all subtle ones
- The skill evaluates the study as reported — it cannot assess unreported data or publication bias within a single paper
Pattern that works: use this skill on the papers you’re citing in your own work, not just papers you’re evaluating for others; it surfaces limitations you should proactively address in your manuscript.
Source and attribution
Originally authored by K-Dense Inc.. The canonical SKILL.md lives in the scientific-critical-thinking folder of their public scientific-agent-skills repository.
License: MIT. Install, adapt, and redistribute with attribution preserved.
This page documents the skill from a practitioner’s perspective. For the formal spec and any updates, defer to the source repo.