# seo-sitemap

> Analyze existing XML sitemaps for format issues, non-200 URLs, and quality signals, or generate new sitemaps from industry templates with quality gates that prevent thin-content scale problems.

**Use case**: Validate or generate XML sitemaps with scale safeguards

**Canonical URL**: https://agentcookbooks.com/skills/seo-sitemap/

**Topics**: claude-code, skills, marketing, seo

**Trigger phrases**: "sitemap", "generate sitemap", "sitemap issues", "XML sitemap", "sitemap validation"

**Source**: [AgriciDaniel](https://github.com/AgriciDaniel/claude-seo/tree/main/skills/seo-sitemap)

**License**: MIT

---

## What it does

`seo-sitemap` is a Claude Code skill from AgriciDaniel's [claude-seo repo](https://github.com/AgriciDaniel/claude-seo). It operates in two modes: analyzing an existing sitemap for validation issues, or generating a new one from industry templates.

Validation checks cover the commonly-missed issues: the 50,000 URL protocol limit per file (a single-file sitemap over this threshold is a Critical finding requiring a sitemap index), non-200 status URLs (should be removed), noindexed URLs appearing in the sitemap (contradictory signal), redirected URLs (should point to final destination), and the Google-confirmed fact that `<priority>` and `<changefreq>` are ignored and can be removed. For generation, industry templates from the `seo-plan` assets directory are loaded and customized, with quality gates: a warning at 30+ location pages (enforce 60%+ unique content) and a hard stop at 50+ location pages requiring user justification.

## When to use it

Reach for it when:

- You want to verify a sitemap is structurally correct and not including URLs that should not be indexed
- A site migration just completed and you need to generate a fresh sitemap reflecting the new URL structure
- You are building a multi-location site and want the quality gates applied before generating 40+ location page entries

When *not* to reach for it:

- You want Search Console sitemap submission status (actual crawl/index coverage from Google) — that requires `seo-google sitemaps <property>` with GSC credentials
- The site uses server-side sitemap generation (Astro, Next.js, etc.) — the skill generates static XML; framework-based sitemap generation is out of scope

## Install

Copy the [`seo-sitemap` SKILL.md](https://github.com/AgriciDaniel/claude-seo/tree/main/skills/seo-sitemap) into `.claude/skills/seo-sitemap/` along with the `assets/` directory from `seo-plan`.

Trigger phrases: "sitemap", "generate sitemap", "sitemap issues", "XML sitemap".

Invoke with `/seo sitemap <url>` to analyze, or `/seo sitemap generate` to create a new sitemap. The skill checks `/sitemap.xml`, `/sitemap_index.xml`, and the robots.txt sitemap reference before reporting "not found".

## What a session looks like

A typical session has three phases:

1. **Discovery and format check.** The sitemap is located (checking common paths and robots.txt references), fetched, and parsed. XML syntax errors are reported with line numbers. URL count is checked against the 50,000 URL limit. `<priority>` and `<changefreq>` tags are flagged as "can be removed" (informational, not a ranking signal).
2. **URL quality checks.** Each URL's HTTP status is checked. Non-200 URLs are flagged High priority for removal. Noindexed URLs in the sitemap are flagged High (contradictory signal). Redirected URLs are flagged Medium (update to final destination). All-identical `<lastmod>` dates are flagged Low (use actual modification timestamps).
3. **Output.** For analysis: `VALIDATION-REPORT.md` with severity-ordered issues and recommendations. For generation: `sitemap.xml` (or split files with a sitemap index if over 50k URLs) and `STRUCTURE.md` documenting the site architecture and URL organization decisions.

## Receipts

**Works well:** The `<priority>` and `<changefreq>` informational note is a useful cleanup — many sitemaps set these from CMS defaults that have never been updated. Removing ignored tags makes the sitemap cleaner and easier to maintain, and the skill makes it clear these are safe to remove.

**Backfires:** URL status checking on large sitemaps can be slow — the skill fetches each URL's HTTP status, and a sitemap with 5,000 URLs is a time-consuming check. For very large sitemaps, checking a random sample and flagging for full verification is more practical.

**Pattern that works:** Cross-check `seo-sitemap` findings against `seo-google sitemaps <property>` results. GSC shows how many submitted URLs were indexed vs. not — combining that with the validation report tells you both what is wrong with the sitemap format and whether Google is actually using it.

## Source and attribution

Originally written by [AgriciDaniel](https://github.com/AgriciDaniel). The canonical SKILL.md and supporting files live in the [`seo-sitemap` folder](https://github.com/AgriciDaniel/claude-seo/tree/main/skills/seo-sitemap) of the [claude-seo repository](https://github.com/AgriciDaniel/claude-seo).

License: MIT. Install, adapt, and redistribute with attribution preserved.

This page documents the skill from a practitioner's perspective. For the formal spec and updates, defer to the source repo.