# Cloudflare Pages won't edge-cache HTML without a Cache Rule

> The _headers Cache-Control directive worked for /_astro/ assets but not HTML. The Cache Rule that fixed it, the UI gotcha, curl MISS-then-HIT.

**Canonical URL**: https://agentcookbooks.com/blog/cloudflare-pages-html-cache-the-missing-rule/

**Published**: 2026-05-26

**Tags**: claude-code, deployment

---

The week-one Cloudflare Pages analytics for this site showed a 5.99% cache rate and 81 MB egress, with only 3 MB served from cache, despite a `public/_headers` file shipping `Cache-Control: public, max-age=3600, s-maxage=86400` on every HTML response since the domain attached. Ten days later the number was effectively unchanged at 4.16%. The fix wasn't a header tweak — it was a Cache Rule in the zone dashboard, because Cloudflare Pages treats HTML as `Dynamic` by default and ignores `Cache-Control` at the edge until you opt in. This is the rule, the curl verification, the UI gotcha that blocks the deploy button, and why the previously-shipped CSP/headers file is necessary but not sufficient.

## The puzzle this resolves

The companion post [Week one on Cloudflare Pages: 449 visitors, 5.99% cache rate, and the puzzle](/blog/cloudflare-pages-html-cache-week-one-449-visitors/) tees up the question: a static Astro site with a perfectly reasonable cache header should not be reading the cache 5.99% of the time, and a Pages dashboard that shows requests dominated by full-site bot sweeps should not be the only explanation for the rest. The short version: `Cache-Control: s-maxage=86400` instructs the CDN to serve cached HTML for a day. Pages reads the header, then ignores it for HTML responses, because the platform classifies HTML as dynamic content. The header is honored on `/_astro/` hashed asset paths (CSS, JS, font files) — that's where the 3 MB of cache reads were coming from. HTML requests fall through to a fresh build pull every time.

## What the dashboard said before the fix

Thirty-day window snapshot at the moment of the change:

| Metric | Value |
|---|---|
| Requests | 15.06k |
| Bandwidth served | 81 MB |
| Bandwidth served from cache | 3 MB |
| Cached request share | 4.16% |
| Edge cache classification on HTML | `Dynamic` |

The `Dynamic` classification is the giveaway. When you click into the cache breakdown, every `text/html` row is grouped under `Dynamic` rather than `Cache HIT` or `Cache MISS revalidated`. That's not a header parsing failure — it's a platform-default routing decision that says "this content type is not eligible for edge cache unless an explicit rule says otherwise."

## The Cache Rule

Cloudflare dashboard → the zone for the apex domain → **Caching** → **Cache Rules** → **Create rule**.

Match expression — keep it scoped to the canonical hostname so the `*.pages.dev` preview surface is unaffected:

```
(http.host eq "agentcookbooks.com")
```

Then under **Then the response from origin will...**, set:

- **Cache eligibility:** *Eligible for cache*
- **Edge TTL:** *Use cache-control header if present, bypass cache if not*
- **Browser TTL:** *Respect origin TTL*

Save. Deploy. The rule is global to the zone; there's no per-route gradient needed because the `_headers` file already differentiates HTML (`max-age=3600, s-maxage=86400`) from immutable hashed assets (`max-age=31536000, immutable`) from Pagefind chunks. The rule's job is to authorize the edge to honor those headers, not to set them.

## The UI gotcha that blocks the deploy button

The rule-creation form starts with one **Match** row and an inline **+ Or** affordance underneath it. If you click **+ Or** out of curiosity — or by mistake — an empty Or-row appears with two `Select...` dropdowns: field and operator. Leaving that empty row on the page does not produce a visible warning. The **Deploy** button accepts the click, then the form drops back to top of page with a generic validation banner: "Configure incomplete." Nothing on the page highlights the empty row.

The fix is to remove the row with the small `×` icon at the row's right edge, not to fill it in. Once the empty row is gone, Deploy goes through. The Or-row is a real feature (combining hostname conditions, path conditions, request-header conditions) — the bug is just that an unfilled second clause silently invalidates the whole rule.

## Verifying the rule landed

Two `curl -sI` requests, back to back, looking at `cf-cache-status` and `Age`:

```
$ curl -sI https://agentcookbooks.com/ | grep -i 'cf-cache-status\|age:'
cf-cache-status: MISS

$ curl -sI https://agentcookbooks.com/ | grep -i 'cf-cache-status\|age:'
cf-cache-status: HIT
age: 12
```

That's the receipt. First request after deploy: `MISS` (no cached copy yet). Second request twelve seconds later: `HIT`, `Age: 12`. The edge served the second request from cache without going to the Pages origin. Both requests honored the `s-maxage=86400` ceiling from `_headers`, so the edge will keep serving cached HTML for 24 hours until either the TTL expires or a deploy invalidates the cache.

What the verification does not do is wait for analytics to catch up. Cloudflare's dashboard updates the cache-rate percentage on a delay (the rolling 30-day window includes everything before the rule landed, which dilutes the post-fix improvement for weeks). The `curl` test is real-time; the dashboard climb is the slow trailing signal.

## What the `_headers` file does and does not do

Looking at the `_headers` file that's been live since the [CSP and Pagefind post](/blog/cloudflare-pages-csp-pagefind/):

```
/*
  Cache-Control: public, max-age=3600, s-maxage=86400

/_astro/*
  Cache-Control: public, max-age=31536000, immutable

/pagefind/*
  Cache-Control: public, max-age=86400, s-maxage=2592000
```

The `/_astro/*` and `/pagefind/*` rules worked from day one — they cover hashed/versioned asset paths that Cloudflare Pages treats as eligible by default. The catch-all `/*` Cache-Control rule was the broken half: it set the right header, but the platform's default treatment of HTML content type stripped its meaning at the edge. The Cache Rule is what tells the platform to read the header at all for HTML routes.

In other words: the `_headers` file is the source of truth for *what to cache and for how long*, and the Cache Rule is the *authorization to apply that policy to HTML at all*. Each does one job; both are required.

## The mental model that would have caught this earlier

Two things would have surfaced the issue weeks before the analytics did:

1. **Trust `curl -sI` over the response header.** A `Cache-Control` header that arrives at the browser does not mean the edge respected it. The browser will dutifully honor `max-age=3600`, but `cf-cache-status: DYNAMIC` on the response tells you the edge never even tried to cache it. The two systems can disagree, and the disagreement is invisible from app-side telemetry.

2. **Look at the Pages cache breakdown, not just the headline cache rate.** A 5.99% cache rate on a static site is suspicious, but a 5.99% rate with the `Dynamic` cell dominating HTML traffic and `HIT` confined to `/_astro/` is diagnostic. The Pages dashboard surfaces the routing decision directly; the headline number obscures it.

The general pattern: when a platform abstraction promises behavior that requires an explicit toggle elsewhere, the absence of the toggle is invisible from inside the abstraction. Cloudflare Pages reads `_headers` perfectly; Cloudflare's edge cache reads its own rules. The two systems share a UI, not a configuration model.

## The post-domain-attach checklist this belongs on

Three Cloudflare-side toggles on this site that are dashboard-only and not in any committed file:

| Toggle | Default | Required value | What breaks if you skip |
|---|---|---|---|
| Block AI training bots | On | Off (or per-bot configured) | `/robots.txt` gets prepended with platform-managed `Disallow` lines that contradict your repo file — see [Cloudflare AI Audit silently rewrites your robots.txt](/blog/cloudflare-ai-audit-robots-txt-trap/) |
| Manage robots.txt | On | Off | Same root cause as above; separate toggle |
| Cache Rules → Cache HTML | Absent | Match `http.host eq "<your-domain>"`, eligibility on, Edge TTL respect-origin | HTML never edge-cached; `_headers` cache directive ignored at the edge |

All three are zero-effort once you know to look. None of the three are visible from the Pages deploy logs, the build output, the repo, or any health-check endpoint a CI run might hit. They live in the zone dashboard, set once after domain attach, easy to miss.

## What I have not verified yet

The post-fix cache-rate climb in the rolling 30-day window. The `curl` test confirms the rule works request-to-request; the dashboard percentage will take weeks to stabilize because the rolling window still averages in the pre-fix days. A follow-up reading at the 30-day mark — when the rolling window is entirely post-fix — is the right time to publish the actual delta. Until then: the receipt is the curl MISS-then-HIT, not the percentage.