Skip to main content

Cloudflare Pages won't edge-cache HTML without a Cache Rule

Illustrated receipt card summarizing: Cloudflare Pages won't edge-cache HTML without a Cache Rule

The week-one Cloudflare Pages analytics for this site showed a 5.99% cache rate and 81 MB egress, with only 3 MB served from cache, despite a public/_headers file shipping Cache-Control: public, max-age=3600, s-maxage=86400 on every HTML response since the domain attached. Ten days later the number was effectively unchanged at 4.16%. The fix wasn’t a header tweak — it was a Cache Rule in the zone dashboard, because Cloudflare Pages treats HTML as Dynamic by default and ignores Cache-Control at the edge until you opt in. This is the rule, the curl verification, the UI gotcha that blocks the deploy button, and why the previously-shipped CSP/headers file is necessary but not sufficient.

The puzzle this resolves

The companion post Week one on Cloudflare Pages: 449 visitors, 5.99% cache rate, and the puzzle tees up the question: a static Astro site with a perfectly reasonable cache header should not be reading the cache 5.99% of the time, and a Pages dashboard that shows requests dominated by full-site bot sweeps should not be the only explanation for the rest. The short version: Cache-Control: s-maxage=86400 instructs the CDN to serve cached HTML for a day. Pages reads the header, then ignores it for HTML responses, because the platform classifies HTML as dynamic content. The header is honored on /_astro/ hashed asset paths (CSS, JS, font files) — that’s where the 3 MB of cache reads were coming from. HTML requests fall through to a fresh build pull every time.

What the dashboard said before the fix

Thirty-day window snapshot at the moment of the change:

MetricValue
Requests15.06k
Bandwidth served81 MB
Bandwidth served from cache3 MB
Cached request share4.16%
Edge cache classification on HTMLDynamic

The Dynamic classification is the giveaway. When you click into the cache breakdown, every text/html row is grouped under Dynamic rather than Cache HIT or Cache MISS revalidated. That’s not a header parsing failure — it’s a platform-default routing decision that says “this content type is not eligible for edge cache unless an explicit rule says otherwise.”

The Cache Rule

Cloudflare dashboard → the zone for the apex domain → CachingCache RulesCreate rule.

Match expression — keep it scoped to the canonical hostname so the *.pages.dev preview surface is unaffected:

(http.host eq "agentcookbooks.com")

Then under Then the response from origin will…, set:

  • Cache eligibility: Eligible for cache
  • Edge TTL: Use cache-control header if present, bypass cache if not
  • Browser TTL: Respect origin TTL

Save. Deploy. The rule is global to the zone; there’s no per-route gradient needed because the _headers file already differentiates HTML (max-age=3600, s-maxage=86400) from immutable hashed assets (max-age=31536000, immutable) from Pagefind chunks. The rule’s job is to authorize the edge to honor those headers, not to set them.

The UI gotcha that blocks the deploy button

The rule-creation form starts with one Match row and an inline + Or affordance underneath it. If you click + Or out of curiosity — or by mistake — an empty Or-row appears with two Select... dropdowns: field and operator. Leaving that empty row on the page does not produce a visible warning. The Deploy button accepts the click, then the form drops back to top of page with a generic validation banner: “Configure incomplete.” Nothing on the page highlights the empty row.

The fix is to remove the row with the small × icon at the row’s right edge, not to fill it in. Once the empty row is gone, Deploy goes through. The Or-row is a real feature (combining hostname conditions, path conditions, request-header conditions) — the bug is just that an unfilled second clause silently invalidates the whole rule.

Verifying the rule landed

Two curl -sI requests, back to back, looking at cf-cache-status and Age:

$ curl -sI https://agentcookbooks.com/ | grep -i 'cf-cache-status\|age:'
cf-cache-status: MISS

$ curl -sI https://agentcookbooks.com/ | grep -i 'cf-cache-status\|age:'
cf-cache-status: HIT
age: 12

That’s the receipt. First request after deploy: MISS (no cached copy yet). Second request twelve seconds later: HIT, Age: 12. The edge served the second request from cache without going to the Pages origin. Both requests honored the s-maxage=86400 ceiling from _headers, so the edge will keep serving cached HTML for 24 hours until either the TTL expires or a deploy invalidates the cache.

What the verification does not do is wait for analytics to catch up. Cloudflare’s dashboard updates the cache-rate percentage on a delay (the rolling 30-day window includes everything before the rule landed, which dilutes the post-fix improvement for weeks). The curl test is real-time; the dashboard climb is the slow trailing signal.

What the _headers file does and does not do

Looking at the _headers file that’s been live since the CSP and Pagefind post:

/*
  Cache-Control: public, max-age=3600, s-maxage=86400

/_astro/*
  Cache-Control: public, max-age=31536000, immutable

/pagefind/*
  Cache-Control: public, max-age=86400, s-maxage=2592000

The /_astro/* and /pagefind/* rules worked from day one — they cover hashed/versioned asset paths that Cloudflare Pages treats as eligible by default. The catch-all /* Cache-Control rule was the broken half: it set the right header, but the platform’s default treatment of HTML content type stripped its meaning at the edge. The Cache Rule is what tells the platform to read the header at all for HTML routes.

In other words: the _headers file is the source of truth for what to cache and for how long, and the Cache Rule is the authorization to apply that policy to HTML at all. Each does one job; both are required.

The mental model that would have caught this earlier

Two things would have surfaced the issue weeks before the analytics did:

  1. Trust curl -sI over the response header. A Cache-Control header that arrives at the browser does not mean the edge respected it. The browser will dutifully honor max-age=3600, but cf-cache-status: DYNAMIC on the response tells you the edge never even tried to cache it. The two systems can disagree, and the disagreement is invisible from app-side telemetry.

  2. Look at the Pages cache breakdown, not just the headline cache rate. A 5.99% cache rate on a static site is suspicious, but a 5.99% rate with the Dynamic cell dominating HTML traffic and HIT confined to /_astro/ is diagnostic. The Pages dashboard surfaces the routing decision directly; the headline number obscures it.

The general pattern: when a platform abstraction promises behavior that requires an explicit toggle elsewhere, the absence of the toggle is invisible from inside the abstraction. Cloudflare Pages reads _headers perfectly; Cloudflare’s edge cache reads its own rules. The two systems share a UI, not a configuration model.

The post-domain-attach checklist this belongs on

Three Cloudflare-side toggles on this site that are dashboard-only and not in any committed file:

ToggleDefaultRequired valueWhat breaks if you skip
Block AI training botsOnOff (or per-bot configured)/robots.txt gets prepended with platform-managed Disallow lines that contradict your repo file — see Cloudflare AI Audit silently rewrites your robots.txt
Manage robots.txtOnOffSame root cause as above; separate toggle
Cache Rules → Cache HTMLAbsentMatch http.host eq "<your-domain>", eligibility on, Edge TTL respect-originHTML never edge-cached; _headers cache directive ignored at the edge

All three are zero-effort once you know to look. None of the three are visible from the Pages deploy logs, the build output, the repo, or any health-check endpoint a CI run might hit. They live in the zone dashboard, set once after domain attach, easy to miss.

What I have not verified yet

The post-fix cache-rate climb in the rolling 30-day window. The curl test confirms the rule works request-to-request; the dashboard percentage will take weeks to stabilize because the rolling window still averages in the pre-fix days. A follow-up reading at the 30-day mark — when the rolling window is entirely post-fix — is the right time to publish the actual delta. Until then: the receipt is the curl MISS-then-HIT, not the percentage.