142 over-long meta descriptions, one 6-line fix
A pre-launch SEO pass flagged one wiki page with a 188-character meta description and called it isolated. Four days later, a wider sample found 142 of 153 skill pages over the 160-char SERP cap, range 161–297, median ~225. The defect wasn’t on the pages — it was in the layout. One 6-line metaDesc() helper in src/layouts/Base.astro closed all 142 in a single commit. Here’s the receipt and the diagnostic mistakes I made finding it. (The earlier audit pass that originally undercounted is written up in Three SEO skills on pre-launch Astro: what each caught.)
What I ran
Two seo-audit passes against the live site, four days apart. The first sampled 8 skill pages and read one over-cap as a per-page editorial issue. The second deliberately widened the sample after a single-page audit on a different post showed the description rendering past 160 too — same template, different content. That re-cast the question: was this a content problem or a template problem?
The sample-widening rule in the seo-audit framework is implicit. It says “check meta descriptions” but doesn’t say “if you find one defect on a templated page type, sample at least five same-template pages.” That’s the lesson the second audit forced into the receipts.
What happened
Full corpus sweep:
153 skill pages total
142 over 160 chars
Range: 161–297
Median: ~225
The descriptions came from each skill’s frontmatter description field — a deliberately LLM-quotable definition of the skill, written long because that’s what makes it useful as a citation in AI search. Trimming each one in frontmatter would have been a multi-hour batch and would have damaged the quotability for AI engines that don’t care about Google’s 160-char SERP cap.
The leverage move was a layout-level smart truncate that cuts the meta tags only — JSON-LD description fields and the visible page lede keep the full text. Six lines, frontmatter of src/layouts/Base.astro:
function metaDesc(text: string, max = 160): string {
if (text.length <= max) return text;
const cut = text.slice(0, max - 1);
const lastSentence = cut.lastIndexOf('. ');
if (lastSentence > 100) return cut.slice(0, lastSentence + 1);
const lastSpace = cut.lastIndexOf(' ');
return (lastSpace > 0 ? cut.slice(0, lastSpace) : cut) + '…';
}
const metaDescription = metaDesc(description);
Applied to three tags only:
<meta name="description" content={metaDescription} />
<meta property="og:description" content={metaDescription} />
<meta name="twitter:description" content={metaDescription} />
The JSON-LD blocks and the rendered page lede on SkillPage.astro use the original description prop, untouched. AI crawlers reading the JSON-LD or the rendered HTML still get the LLM-quotable form; Google’s SERP scraper gets the 160-char cut.
Verification was a Node one-liner against the built HTML — spot-checked the 10 longest descriptions post-build, all came in ≤160:
seo 156
seo-audit 156
mcp-server-builder 150
grill-me 158
humanizer 159
schema-markup 156
consciousness-council 157
ppt-master 159
transformers 148
scikit-learn 155
Build time: 3.5s. One file edited. 142 pages now serve compliant meta descriptions. Leverage ratio: 142×.
Where it drifted
Two diagnostic mistakes, both worth flagging because they probably bite anyone running a SERP-cap audit on a UTF-8 site.
wc -c and wc -m both lied on Git Bash for UTF-8 multi-byte chars. Em dashes (—) are 3 bytes in UTF-8. Ellipsis (…) is 3 bytes. Both are the same one character to Google’s SERP renderer, which counts JS string length, not bytes. On Git Bash, wc -c and wc -m both reported byte counts, inflating reported meta-description lengths by 2–6 chars per description. I’d flagged five descriptions as 165–172 chars that turned out to be 161–168 — still over the cap, but borderline cases got mis-classified as critical.
The fix is a Node one-liner. Read the built HTML, regex out the <meta name="description" content="..."> value, and use .length:
const html = require('fs').readFileSync(file, 'utf8');
const m = html.match(/<meta name="description" content="([^"]+)"/);
console.log(m[1].length, '→', m[1]);
JS string length is what Google measures. Use it for SERP-cap diagnostics. Don’t trust wc on UTF-8.
The second mistake was sed-based answer-block extraction missing nested markdown. While diagnosing a separate “lede word count” issue on a blog post, I’d run a sed pipeline to extract the first paragraph and counted 45 words. That number got into a session note. The actual lede was 72 words — the sed regex skipped the paragraph because it had a nested markdown link with brackets and a backtick code span that broke the boundary match. Caught and corrected before the wrong number propagated, but only because I cross-checked against DOM extraction.
Both are the same shape of bug: shell tools assume ASCII or assume flat text. Anything richer than ASCII or flatter than markdown breaks them silently. For SEO/SERP work, one-line Node scripts beat shell pipelines on UTF-8 content with markdown.
What I’d change
The smart truncate handles the common case but isn’t editorial-grade everywhere. Spot-check of the 10 cuts: 6 land on word boundaries that read clean (...sub-skills and 18…, ...action plan tied to…); 4 end on commas or articles (...question answering,…, ...rendered through an…). Acceptable as SERP snippets, not great as editorial copy.
Two follow-ups worth doing, in order:
- Hand-trim the top 10–15 highest-traffic skills’ frontmatter for SERP-grade copy. Smart truncate is the floor; editorial cuts are the ceiling. The helper makes the floor non-negotiable; the ceiling needs human eyes.
- Add a “sample five same-template pages” prompt to the
seo-auditskill receipts so the next n=8 sample doesn’t underread a layout-level pattern as a per-page issue.
The skill framework’s checklist is the first pass. The pattern that actually surfaces layout-level defects comes from running the framework against prior receipts and widening the sample when something repeats. Today’s biggest finding was invisible at n=8.