# codebase-onboarding-engineer

> A Claude Code agent persona that builds an orientation map of an unfamiliar codebase by reading source files and tracing real execution paths — and refuses to state anything that isn't grounded in the code it actually inspected.

**Use case**: Get a fast, fact-only mental model of a new repo without inferred-from-vibes summaries

**Canonical URL**: https://agentcookbooks.com/skills/codebase-onboarding-engineer/

**Topics**: claude-code, skills, documentation, planning

**Trigger phrases**: "explain this codebase", "where do I start in this repo", "trace this code path"

**Source**: [Michael Sitarzewski](https://github.com/msitarzewski/agency-agents/blob/main/engineering/engineering-codebase-onboarding-engineer.md)

**License**: MIT

---

## What it does

`codebase-onboarding-engineer` is the repo-orientation persona in the agency-agents collection. It inventories the project's manifests and entry points, traces real execution paths (request → router → service → persistence → response), identifies the seams between presentation, domain, and I/O layers, and ships a three-tier output: a one-line summary, a five-minute explanation, and a deep-dive map.

The hard rule is "code before everything." The persona refuses to claim a module owns behavior unless it can point to the file that implements or routes it. Inference, intent-guessing, and "this *probably* does X" are out — when the answer is partial, the persona lists which files were inspected and which weren't, instead of bluffing.

## When to use it

- Joining a new repo cold and needing the "if you only read three files, read these" answer
- Onboarding a teammate or another agent — the persona's three-tier output (1-line / 5-min / deep dive) is shaped exactly for handover
- Tracing where a specific behavior lives ("which file owns auth?") without a refactor plan attached
- Mapping a polyglot or monorepo before any cross-language change
- Surfacing dead code, duplicate abstractions, or misleading names that look load-bearing but aren't

When *not* to reach for it:

- Repos you already know well — the structure is overkill
- When you actually want a code review or refactor plan — the persona is explicitly read-only and refuses to suggest changes
- Tiny single-file scripts where there is no architecture to map

## Install

From [msitarzewski/agency-agents](https://github.com/msitarzewski/agency-agents) at `engineering/engineering-codebase-onboarding-engineer.md`. Copy to `~/.claude/agents/` or use the repo's installer. Standalone.

## What a session looks like

1. **Inventory.** Manifests, lockfiles, framework markers, top-level directories. Is this an app, library, monorepo, or mixed workspace? Code-bearing directories only — `node_modules`, build output, and caches are excluded.
2. **Entry-point discovery.** The smallest set of files that define how the system starts: `server.ts`, `main.py`, `cli.go`, package exports. The persona surfaces these by name.
3. **Trace one real path end-to-end.** Pick a representative request, command, or function call. Follow it through validation → orchestration → core logic → persistence → response. Quote the file at each hop.
4. **Boundary analysis.** Where does presentation end and domain begin? Where does I/O happen? Are there cross-cutting concerns (auth, logging, config) that touch every layer?
5. **Three-tier output.** One line ("This is a Node.js API with routing in `src/http`, orchestration in `src/services`, and persistence in `src/repositories`"). Five-minute explanation with key files and main code paths. Deep-dive map with a path/purpose table, layer separation, and the full flow described step-by-step.
6. **Honesty about limits.** A "Files inspected" list at the bottom — if `worker.ts` wasn't read, the persona says so explicitly instead of inferring what it does.

The discipline that makes it work: the inspection-list footer. A summary that pretends to cover the whole repo when only one subsystem was read is exactly the failure mode the persona was built against.

## Receipts

### 2026-05-05 — Producing a "first day" doc for the agentcookbooks repo

Pointed the persona at this repo with an explicit 8-file reading list (CLAUDE.md, `src/content.config.ts`, `Base.astro` head, `index.astro`, `Header.astro`, `package.json`, `astro.config.mjs`, `topics.ts` first 100 lines) and "don't expand the search" — bias toward orientation, not exploration.

Output came back as a 6-section ~570-word doc: 5-bullet "what this is", a single setup-verification command (`npm install && npm run build`), 3 reading priorities with rationale, a 6-line site map, a first-week pitfall pick, and a pointer to env-specific notes in `CLAUDE.local.md` / `session.md`. The pitfall pick was the receipt-worthy bit: forgetting `draft: false` in frontmatter. The schema defaults it, but every example file writes it explicitly — so it's easy to copy a draft template, omit the line, assume defaults handle it (they do), and miss that `draft: true` files silently disappear from sitemap, RSS, llms.txt, and every topic page with no error.

Honest scope: persona was emulated (general-purpose agent + persona's described framing). The "code before everything" rule and the "Files inspected" footer both held — every section named a specific file or command, and the doc did not bluff. The three-tier output (1-line / 5-min / deep dive) the persona normally produces wasn't fully exercised because the brief asked for a single-page first-day doc, not the deep-dive map.

## Source and attribution

From [Michael Sitarzewski's agency-agents repository](https://github.com/msitarzewski/agency-agents/blob/main/engineering/engineering-codebase-onboarding-engineer.md), an MIT-licensed collection of 144+ AI agent personas across engineering, marketing, design, testing, and specialized roles.

License: MIT.

Quote from the persona body, verbatim: *"State only facts grounded in the code that was actually inspected."* The inspection-list footer and the refusal to suggest changes are how the persona keeps that rule operational instead of decorative.