# Week one on Cloudflare Pages: 449 visitors, 5.99% cached

> First seven days post-domain-attach: 449 visitors, 8.7 requests per visit, 5.99% cache rate. GPTBot spiked +3,763%. Three hypotheses, one already confirmed.

**Canonical URL**: https://agentcookbooks.com/blog/cloudflare-pages-html-cache-week-one-449-visitors/

**Published**: 2026-05-09

**Tags**: claude-code, deployment, analytics

---

The custom domain attached on April 30. Seven days later, the Cloudflare Pages dashboard showed 449 unique visitors, 3.89k requests, 5.99% cache rate, 24 MB egress. Three of those numbers tell a clean story; the fourth — 5.99% cache rate — looked broken for a static Astro site that should be cache-friendly out of the box. The visitor count was real-user dominant but the request count was not: a single `GPTBot` full-site sweep landed inside the window (425 hits in 24 hours, +3,763% vs the prior period) and re-shaped the country distribution toward bot-datacenter geographies. This is the week-one read, with three ranked hypotheses on the cache puzzle, one already confirmed at the four-day mark, and a note on which numbers in a static-site CF dashboard are signal and which are noise. The cache-rate puzzle resolves later in a follow-up — the short version: Cloudflare Pages ignores `Cache-Control` for HTML at the edge unless you also create an explicit Cache Rule in the zone dashboard.

## What I ran

No additional tooling beyond what Cloudflare Pages emits by default. Two screenshots from the dashboard's seven-day rolling window (April 25 – May 2): top-line metrics, country breakdown, hourly curve. The site itself: Astro 6 static output, Pagefind 1.5.x for search, hosted on Cloudflare Pages with a `public/_headers` file shipping HSTS / CSP / `X-Frame-Options: DENY` / `Permissions-Policy` — the CSP rationale and the two undocumented Pagefind directives are written up in [A static-site CSP that doesn't break Pagefind](/blog/cloudflare-pages-csp-pagefind/). No analytics SDK, no third-party JS. Whatever shows up in the CF dashboard is the only telemetry.

## What happened

Top-line for the week:

| Metric | Value |
|---|---|
| Unique visitors | 449 |
| Total requests | 3.89k |
| Percent cached | 5.99% |
| Total data served | 24 MB |
| Requests per visitor | 8.7 |
| Average response size | ~6.2 KB |

The curve shape is the cleanest signal in the data. Essentially flat at zero for the four days before the custom domain attached, ramps on April 29, peaks May 1 at ~190 visitors / ~1.5k requests / ~10 MB, slight decay May 2 (~210 visitors, still elevated). Three operational events line up precisely with the three inflection points:

- **Apr 25–28 flatline:** still serving from `agentcookbooks.pages.dev` only. The Cloudflare Pages middleware was emitting `X-Robots-Tag: noindex, nofollow` on `*.pages.dev` hosts; Search Console had no sitemap. Traffic was effectively zero by design.
- **Apr 30 ramp:** custom domain attached, sitemap submitted to GSC (Success status, crawl pending), `pages.dev` middleware swapped from noindex to 301-to-canonical.
- **May 1 peak:** first day with both indexing pickup *and* the [Cloudflare AI Audit / robots.txt fix](/blog/cloudflare-ai-audit-robots-txt-trap/) landed. Live `/robots.txt` byte-for-byte matched the repo file.

8.7 requests per unique visitor is healthy session depth — visitors clicking through the wiki via topic chips and Pagefind search, not bouncing on first page. ~6.2 KB per response says Astro static + Pagefind is genuinely lean. No surprise spikes.

The geographic mix needs one discount factor to read accurately: the operator's own dev/preview hits show up in the country column. After subtracting them, the ~204 estimated external visitors per week is the real-reader baseline. Headline number is roughly twice the actual baseline; useful to know going in.

## Where it drifted

5.99% cache rate is the puzzle. Static sites on a CDN should be cache-friendly out of the box. Three hypotheses on day one, ranked by likelihood:

1. **Cold-start dominant.** ~165 unique URLs (153 skill pages + 6 blog posts + indexes), most being requested for the first time. Self-resolves as the corpus gets indexed and crawler traffic fills the cache.
2. **Bot-heavy traffic.** GoogleBot + the AI crawlers explicitly allowed via `public/robots.txt` (ChatGPT-User, PerplexityBot, ClaudeBot, etc.) request unique URLs in sequence and bypass cache by design.
3. **Header config gap.** Cloudflare Pages doesn't set aggressive `Cache-Control` on HTML by default. If response headers say something like `Cache-Control: public, max-age=0, must-revalidate`, every request is a revalidate cycle that may not count as "cached" in the dashboard's accounting.

Four days later, **option 3 was the answer.** Cloudflare's cache panel showed ~451 `Dynamic` responses against ~51 `Hit + Revalidated` (~10% effective cache, marginally up from 5.99%). For Cloudflare Pages, HTML is treated as `Dynamic` by default — the CDN won't cache it without an explicit rule. Two paths to fix:

- Add a Cache Rule in the CF dashboard: `*.agentcookbooks.com/*` → `Cache Everything`, edge TTL 1h, browser TTL 5m.
- Or set `Cache-Control: public, max-age=300, s-maxage=3600` on HTML in `public/_headers`.

Both are free CDN performance for a static site. Open work, not yet shipped.

The second drift was reading bot traffic as human. Around the same four-day mark, Cloudflare's bot panel showed **GPTBot crawled 425 requests in 24 hours, +3,763% over the prior period**. Full-site sweep of 229 built pages plus repeat sitemap fetches. The pattern, traced backward through the dashboard:

- Deploy push → Googlebot spike (+406% earlier in the week).
- Googlebot reindexes `/sitemap.xml`.
- GPTBot follows the freshly-indexed sitemap a few hours later.
- ClaudeBot stays at 5–10/day (Anthropic's normal slow-and-polite pace).

Real-user traffic was holding ~150–200/day at that point. Bot and scanner noise was meaningfully shaping the headline counts. Worth knowing on a dashboard with no bot/human split surfaced by default — the `4xx` column on a static site is almost entirely WordPress scanners probing `/wp-login.php` and `/xmlrpc.php`, the country mix tilts toward bot datacenters, the cache rate gets dragged down by crawlers requesting unique URLs.

## What I'd change

Three things, in order of leverage.

**Add the HTML cache rule.** This is the highest-ROI move on the list and the only one that's definitely worth doing. Five-line addition to `public/_headers` or a single dashboard rule. Free CDN performance.

**Re-check AI search citations at the four-week mark, not the one-week mark.** AI search citations land 1–4 weeks post-crawl. Querying ChatGPT or Perplexity for site mentions a week after launch is too early to read signal. Re-check around 2026-05-13 — the four-week window from the May 1 indexing peak.

**Don't read the headline number at face value.** ~204 estimated external visitors after self-traffic discount, against a crawler population that nearly tripled the request count on its peak day, against a cache rate gated by a default that needs a one-line override. Each correction halves or doubles a number. Read four numbers together — visitors, requests, cache rate, bot share — and the picture is consistent. Read any single number and it's misleading.

The full 30-day post lands once those three cycles play out. This is week one's receipt: the curve fits the operations, the cache puzzle had a one-line explanation hiding in a default, and the bot share is doing more of the work than the dashboard's top metrics suggest.