codebase-onboarding-engineer

A Claude Code agent persona that builds an orientation map of an unfamiliar codebase by reading source files and tracing real execution paths — and refuses to state anything that isn't grounded in the code it actually inspected.

Get a fast, fact-only mental model of a new repo without inferred-from-vibes summaries

Source Michael Sitarzewski

License MIT

First documented 2026-05-05

Receipts firsthand ✓

Documentation Planning

Trigger phrases

Phrases that activate this skill when typed to Claude Code:

explain this codebase
where do I start in this repo
trace this code path

What it does

codebase-onboarding-engineer is the repo-orientation persona in the agency-agents collection. It inventories the project’s manifests and entry points, traces real execution paths (request → router → service → persistence → response), identifies the seams between presentation, domain, and I/O layers, and ships a three-tier output: a one-line summary, a five-minute explanation, and a deep-dive map.

The hard rule is “code before everything.” The persona refuses to claim a module owns behavior unless it can point to the file that implements or routes it. Inference, intent-guessing, and “this probably does X” are out — when the answer is partial, the persona lists which files were inspected and which weren’t, instead of bluffing.

When to use it

Joining a new repo cold and needing the “if you only read three files, read these” answer
Onboarding a teammate or another agent — the persona’s three-tier output (1-line / 5-min / deep dive) is shaped exactly for handover
Tracing where a specific behavior lives (“which file owns auth?”) without a refactor plan attached
Mapping a polyglot or monorepo before any cross-language change
Surfacing dead code, duplicate abstractions, or misleading names that look load-bearing but aren’t

When not to reach for it:

Repos you already know well — the structure is overkill
When you actually want a code review or refactor plan — the persona is explicitly read-only and refuses to suggest changes
Tiny single-file scripts where there is no architecture to map

Install

From msitarzewski/agency-agents at engineering/engineering-codebase-onboarding-engineer.md. Copy to ~/.claude/agents/ or use the repo’s installer. Standalone.

What a session looks like

Inventory. Manifests, lockfiles, framework markers, top-level directories. Is this an app, library, monorepo, or mixed workspace? Code-bearing directories only — node_modules, build output, and caches are excluded.
Entry-point discovery. The smallest set of files that define how the system starts: server.ts, main.py, cli.go, package exports. The persona surfaces these by name.
Trace one real path end-to-end. Pick a representative request, command, or function call. Follow it through validation → orchestration → core logic → persistence → response. Quote the file at each hop.
Boundary analysis. Where does presentation end and domain begin? Where does I/O happen? Are there cross-cutting concerns (auth, logging, config) that touch every layer?
Three-tier output. One line (“This is a Node.js API with routing in src/http, orchestration in src/services, and persistence in src/repositories”). Five-minute explanation with key files and main code paths. Deep-dive map with a path/purpose table, layer separation, and the full flow described step-by-step.
Honesty about limits. A “Files inspected” list at the bottom — if worker.ts wasn’t read, the persona says so explicitly instead of inferring what it does.

The discipline that makes it work: the inspection-list footer. A summary that pretends to cover the whole repo when only one subsystem was read is exactly the failure mode the persona was built against.

Receipts

2026-05-05 — Producing a “first day” doc for the agentcookbooks repo

Pointed the persona at this repo with an explicit 8-file reading list (CLAUDE.md, src/content.config.ts, Base.astro head, index.astro, Header.astro, package.json, astro.config.mjs, topics.ts first 100 lines) and “don’t expand the search” — bias toward orientation, not exploration.

Output came back as a 6-section ~570-word doc: 5-bullet “what this is”, a single setup-verification command (npm install && npm run build), 3 reading priorities with rationale, a 6-line site map, a first-week pitfall pick, and a pointer to env-specific notes in CLAUDE.local.md / session.md. The pitfall pick was the receipt-worthy bit: forgetting draft: false in frontmatter. The schema defaults it, but every example file writes it explicitly — so it’s easy to copy a draft template, omit the line, assume defaults handle it (they do), and miss that draft: true files silently disappear from sitemap, RSS, llms.txt, and every topic page with no error.

Honest scope: persona was emulated (general-purpose agent + persona’s described framing). The “code before everything” rule and the “Files inspected” footer both held — every section named a specific file or command, and the doc did not bluff. The three-tier output (1-line / 5-min / deep dive) the persona normally produces wasn’t fully exercised because the brief asked for a single-page first-day doc, not the deep-dive map.

Source and attribution

From Michael Sitarzewski’s agency-agents repository, an MIT-licensed collection of 144+ AI agent personas across engineering, marketing, design, testing, and specialized roles.

License: MIT.

Quote from the persona body, verbatim: “State only facts grounded in the code that was actually inspected.” The inspection-list footer and the refusal to suggest changes are how the persona keeps that rule operational instead of decorative.