codebase-onboarding-engineer
A Claude Code agent persona that builds an orientation map of an unfamiliar codebase by reading source files and tracing real execution paths — and refuses to state anything that isn't grounded in the code it actually inspected.
Get a fast, fact-only mental model of a new repo without inferred-from-vibes summaries
Trigger phrases
Phrases that activate this skill when typed to Claude Code:
explain this codebasewhere do I start in this repotrace this code path
What it does
codebase-onboarding-engineer is the repo-orientation persona in the agency-agents collection. It inventories the project’s manifests and entry points, traces real execution paths (request → router → service → persistence → response), identifies the seams between presentation, domain, and I/O layers, and ships a three-tier output: a one-line summary, a five-minute explanation, and a deep-dive map.
The hard rule is “code before everything.” The persona refuses to claim a module owns behavior unless it can point to the file that implements or routes it. Inference, intent-guessing, and “this probably does X” are out — when the answer is partial, the persona lists which files were inspected and which weren’t, instead of bluffing.
When to use it
- Joining a new repo cold and needing the “if you only read three files, read these” answer
- Onboarding a teammate or another agent — the persona’s three-tier output (1-line / 5-min / deep dive) is shaped exactly for handover
- Tracing where a specific behavior lives (“which file owns auth?”) without a refactor plan attached
- Mapping a polyglot or monorepo before any cross-language change
- Surfacing dead code, duplicate abstractions, or misleading names that look load-bearing but aren’t
When not to reach for it:
- Repos you already know well — the structure is overkill
- When you actually want a code review or refactor plan — the persona is explicitly read-only and refuses to suggest changes
- Tiny single-file scripts where there is no architecture to map
Install
From msitarzewski/agency-agents at engineering/engineering-codebase-onboarding-engineer.md. Copy to ~/.claude/agents/ or use the repo’s installer. Standalone.
What a session looks like
- Inventory. Manifests, lockfiles, framework markers, top-level directories. Is this an app, library, monorepo, or mixed workspace? Code-bearing directories only —
node_modules, build output, and caches are excluded. - Entry-point discovery. The smallest set of files that define how the system starts:
server.ts,main.py,cli.go, package exports. The persona surfaces these by name. - Trace one real path end-to-end. Pick a representative request, command, or function call. Follow it through validation → orchestration → core logic → persistence → response. Quote the file at each hop.
- Boundary analysis. Where does presentation end and domain begin? Where does I/O happen? Are there cross-cutting concerns (auth, logging, config) that touch every layer?
- Three-tier output. One line (“This is a Node.js API with routing in
src/http, orchestration insrc/services, and persistence insrc/repositories”). Five-minute explanation with key files and main code paths. Deep-dive map with a path/purpose table, layer separation, and the full flow described step-by-step. - Honesty about limits. A “Files inspected” list at the bottom — if
worker.tswasn’t read, the persona says so explicitly instead of inferring what it does.
The discipline that makes it work: the inspection-list footer. A summary that pretends to cover the whole repo when only one subsystem was read is exactly the failure mode the persona was built against.
Receipts
2026-05-05 — Producing a “first day” doc for the agentcookbooks repo
Pointed the persona at this repo with an explicit 8-file reading list (CLAUDE.md, src/content.config.ts, Base.astro head, index.astro, Header.astro, package.json, astro.config.mjs, topics.ts first 100 lines) and “don’t expand the search” — bias toward orientation, not exploration.
Output came back as a 6-section ~570-word doc: 5-bullet “what this is”, a single setup-verification command (npm install && npm run build), 3 reading priorities with rationale, a 6-line site map, a first-week pitfall pick, and a pointer to env-specific notes in CLAUDE.local.md / session.md. The pitfall pick was the receipt-worthy bit: forgetting draft: false in frontmatter. The schema defaults it, but every example file writes it explicitly — so it’s easy to copy a draft template, omit the line, assume defaults handle it (they do), and miss that draft: true files silently disappear from sitemap, RSS, llms.txt, and every topic page with no error.
Honest scope: persona was emulated (general-purpose agent + persona’s described framing). The “code before everything” rule and the “Files inspected” footer both held — every section named a specific file or command, and the doc did not bluff. The three-tier output (1-line / 5-min / deep dive) the persona normally produces wasn’t fully exercised because the brief asked for a single-page first-day doc, not the deep-dive map.
Source and attribution
From Michael Sitarzewski’s agency-agents repository, an MIT-licensed collection of 144+ AI agent personas across engineering, marketing, design, testing, and specialized roles.
License: MIT.
Quote from the persona body, verbatim: “State only facts grounded in the code that was actually inspected.” The inspection-list footer and the refusal to suggest changes are how the persona keeps that rule operational instead of decorative.