Cloudflare Pages won't edge-cache HTML without a Cache Rule
The week-one Cloudflare Pages analytics for this site showed a 5.99% cache rate and 81 MB egress, with only 3 MB served from cache, despite a public/_headers file shipping Cache-Control: public, max-age=3600, s-maxage=86400 on every HTML response since the domain attached. Ten days later the number was effectively unchanged at 4.16%. The fix wasn’t a header tweak — it was a Cache Rule in the zone dashboard, because Cloudflare Pages treats HTML as Dynamic by default and ignores Cache-Control at the edge until you opt in. This is the rule, the curl verification, the UI gotcha that blocks the deploy button, and why the previously-shipped CSP/headers file is necessary but not sufficient.
The puzzle this resolves
The companion post Week one on Cloudflare Pages: 449 visitors, 5.99% cache rate, and the puzzle tees up the question: a static Astro site with a perfectly reasonable cache header should not be reading the cache 5.99% of the time, and a Pages dashboard that shows requests dominated by full-site bot sweeps should not be the only explanation for the rest. The short version: Cache-Control: s-maxage=86400 instructs the CDN to serve cached HTML for a day. Pages reads the header, then ignores it for HTML responses, because the platform classifies HTML as dynamic content. The header is honored on /_astro/ hashed asset paths (CSS, JS, font files) — that’s where the 3 MB of cache reads were coming from. HTML requests fall through to a fresh build pull every time.
What the dashboard said before the fix
Thirty-day window snapshot at the moment of the change:
| Metric | Value |
|---|---|
| Requests | 15.06k |
| Bandwidth served | 81 MB |
| Bandwidth served from cache | 3 MB |
| Cached request share | 4.16% |
| Edge cache classification on HTML | Dynamic |
The Dynamic classification is the giveaway. When you click into the cache breakdown, every text/html row is grouped under Dynamic rather than Cache HIT or Cache MISS revalidated. That’s not a header parsing failure — it’s a platform-default routing decision that says “this content type is not eligible for edge cache unless an explicit rule says otherwise.”
The Cache Rule
Cloudflare dashboard → the zone for the apex domain → Caching → Cache Rules → Create rule.
Match expression — keep it scoped to the canonical hostname so the *.pages.dev preview surface is unaffected:
(http.host eq "agentcookbooks.com")
Then under Then the response from origin will…, set:
- Cache eligibility: Eligible for cache
- Edge TTL: Use cache-control header if present, bypass cache if not
- Browser TTL: Respect origin TTL
Save. Deploy. The rule is global to the zone; there’s no per-route gradient needed because the _headers file already differentiates HTML (max-age=3600, s-maxage=86400) from immutable hashed assets (max-age=31536000, immutable) from Pagefind chunks. The rule’s job is to authorize the edge to honor those headers, not to set them.
The UI gotcha that blocks the deploy button
The rule-creation form starts with one Match row and an inline + Or affordance underneath it. If you click + Or out of curiosity — or by mistake — an empty Or-row appears with two Select... dropdowns: field and operator. Leaving that empty row on the page does not produce a visible warning. The Deploy button accepts the click, then the form drops back to top of page with a generic validation banner: “Configure incomplete.” Nothing on the page highlights the empty row.
The fix is to remove the row with the small × icon at the row’s right edge, not to fill it in. Once the empty row is gone, Deploy goes through. The Or-row is a real feature (combining hostname conditions, path conditions, request-header conditions) — the bug is just that an unfilled second clause silently invalidates the whole rule.
Verifying the rule landed
Two curl -sI requests, back to back, looking at cf-cache-status and Age:
$ curl -sI https://agentcookbooks.com/ | grep -i 'cf-cache-status\|age:'
cf-cache-status: MISS
$ curl -sI https://agentcookbooks.com/ | grep -i 'cf-cache-status\|age:'
cf-cache-status: HIT
age: 12
That’s the receipt. First request after deploy: MISS (no cached copy yet). Second request twelve seconds later: HIT, Age: 12. The edge served the second request from cache without going to the Pages origin. Both requests honored the s-maxage=86400 ceiling from _headers, so the edge will keep serving cached HTML for 24 hours until either the TTL expires or a deploy invalidates the cache.
What the verification does not do is wait for analytics to catch up. Cloudflare’s dashboard updates the cache-rate percentage on a delay (the rolling 30-day window includes everything before the rule landed, which dilutes the post-fix improvement for weeks). The curl test is real-time; the dashboard climb is the slow trailing signal.
What the _headers file does and does not do
Looking at the _headers file that’s been live since the CSP and Pagefind post:
/*
Cache-Control: public, max-age=3600, s-maxage=86400
/_astro/*
Cache-Control: public, max-age=31536000, immutable
/pagefind/*
Cache-Control: public, max-age=86400, s-maxage=2592000
The /_astro/* and /pagefind/* rules worked from day one — they cover hashed/versioned asset paths that Cloudflare Pages treats as eligible by default. The catch-all /* Cache-Control rule was the broken half: it set the right header, but the platform’s default treatment of HTML content type stripped its meaning at the edge. The Cache Rule is what tells the platform to read the header at all for HTML routes.
In other words: the _headers file is the source of truth for what to cache and for how long, and the Cache Rule is the authorization to apply that policy to HTML at all. Each does one job; both are required.
The mental model that would have caught this earlier
Two things would have surfaced the issue weeks before the analytics did:
-
Trust
curl -sIover the response header. ACache-Controlheader that arrives at the browser does not mean the edge respected it. The browser will dutifully honormax-age=3600, butcf-cache-status: DYNAMICon the response tells you the edge never even tried to cache it. The two systems can disagree, and the disagreement is invisible from app-side telemetry. -
Look at the Pages cache breakdown, not just the headline cache rate. A 5.99% cache rate on a static site is suspicious, but a 5.99% rate with the
Dynamiccell dominating HTML traffic andHITconfined to/_astro/is diagnostic. The Pages dashboard surfaces the routing decision directly; the headline number obscures it.
The general pattern: when a platform abstraction promises behavior that requires an explicit toggle elsewhere, the absence of the toggle is invisible from inside the abstraction. Cloudflare Pages reads _headers perfectly; Cloudflare’s edge cache reads its own rules. The two systems share a UI, not a configuration model.
The post-domain-attach checklist this belongs on
Three Cloudflare-side toggles on this site that are dashboard-only and not in any committed file:
| Toggle | Default | Required value | What breaks if you skip |
|---|---|---|---|
| Block AI training bots | On | Off (or per-bot configured) | /robots.txt gets prepended with platform-managed Disallow lines that contradict your repo file — see Cloudflare AI Audit silently rewrites your robots.txt |
| Manage robots.txt | On | Off | Same root cause as above; separate toggle |
| Cache Rules → Cache HTML | Absent | Match http.host eq "<your-domain>", eligibility on, Edge TTL respect-origin | HTML never edge-cached; _headers cache directive ignored at the edge |
All three are zero-effort once you know to look. None of the three are visible from the Pages deploy logs, the build output, the repo, or any health-check endpoint a CI run might hit. They live in the zone dashboard, set once after domain attach, easy to miss.
What I have not verified yet
The post-fix cache-rate climb in the rolling 30-day window. The curl test confirms the rule works request-to-request; the dashboard percentage will take weeks to stabilize because the rolling window still averages in the pre-fix days. A follow-up reading at the 30-day mark — when the rolling window is entirely post-fix — is the right time to publish the actual delta. Until then: the receipt is the curl MISS-then-HIT, not the percentage.