🔍 Pagefind Search Integration¶
Built: 2026-04-12 (Session 2)
Status: ✅ Live
Replaces: Customsearch.js(substring matching on 261 entries)
What It Is¶
Pagefind is a Rust-based static search engine that indexes the full HTML output of the Hugo site at build time. It replaces our old custom search that only matched substrings in titles and descriptions.
Before vs After¶
| Aspect | Old Search | Pagefind |
|---|---|---|
| Indexed content | Titles + descriptions only (261 entries) | Full page text (895 pages, 7,117 words) |
| Matching | Substring only | Fuzzy, stemming, multi-word |
| Index size | 112KB loaded upfront | ~5KB JS + fragments loaded on-demand |
| Body text searchable | ❌ | ✅ |
| Offline | ✅ | ✅ |
| Maintenance | Manual index template | Auto-indexes HTML output |
How It Works¶
- Hugo builds the site (
hugo --minify) → outputs HTML topublic/ - Pagefind runs (
npx pagefind@latest --site public) → indexes all<main data-pagefind-body>content - Creates compressed index fragments in
public/pagefind/ - Browser loads
pagefind-ui.js(117KB) + fetches index fragments on-demand when user searches
Key Implementation Details¶
data-pagefind-body on <main>¶
⚠️ Critical: When ANY element has data-pagefind-body, Pagefind ONLY indexes elements with that attribute. Adding it to <main> ensures all page content is indexed site-wide.
data-pagefind-ignore on nav/footer¶
Prevents nav menu items and footer links from polluting search results.Hidden searchable content for JS-rendered pages¶
7 tools render their main content via JavaScript (empty HTML shells at build time). Pagefind can't index JS-rendered content, so we add hidden searchable blocks:
<!-- Example: AI News template -->
<div data-pagefind-body style="display:none" aria-hidden="true">
<h1>AI News — Real-Time AI Industry Updates</h1>
<p>Track the latest AI news from Microsoft, OpenAI, Google, Anthropic...</p>
<p>Features: daily, weekly, monthly views. Category filtering...</p>
</div>
Pages with hidden blocks: AI News, M365 Roadmap, Service Health, Deprecation Timeline, Cert Tracker (list), Copilot Matrix, Feedback, Site Analytics.
Files¶
| File | Purpose |
|---|---|
layouts/_default/baseof.html |
Search modal with Pagefind UI container, data-pagefind-body on <main> |
static/js/search-init.js |
Pagefind UI initialisation, Ctrl+K shortcut, modal open/close |
static/css/search-overrides.css |
Dark glassmorphism theme for Pagefind UI |
static/js/search.js |
DELETED — old custom search replaced by Pagefind |
.github/workflows/deploy.yml |
Added npx pagefind@latest --site public step after Hugo build |
Build Pipeline¶
# deploy.yml
- name: Build site
run: hugo --minify
- name: Build search index
run: npx pagefind@latest --site public
- name: Deploy to Azure Static Web Apps
uses: Azure/static-web-apps-deploy@v1
Pagefind runs AFTER Hugo and BEFORE deploy — it adds the pagefind/ directory to public/.
Dark Theme Overrides¶
search-overrides.css customises the Pagefind UI to match the site's dark glassmorphism:
- Input: Dark background, cyan border on focus, blur backdrop
- Results: Glass cards with subtle border, cyan title links
- Highlights: Cyan
markelements for matching terms - Sub-results: Indented with cyan left border
- Modal: Full-screen overlay with blur backdrop, 640px max-width
Synonyms (Migrated from old search.js)¶
The old search had a synonym map (~30 terms like m365 → microsoft 365). Pagefind doesn't support custom synonyms natively, but its fuzzy matching and stemming handle most cases:
- "m365" finds "Microsoft 365" via fuzzy match
- "copilot" finds all Copilot-related pages via full-text
- "licence" finds "license" via stemming
If specific synonyms are needed in future, Pagefind supports custom indexing configuration.
Semantic Search (Deferred — Phase 2)¶
A rubber-duck critique recommended deferring semantic search (Orama + build-time embeddings) because:
- Needs build-time scripting beyond normal Hugo
- Needs API key handling + re-embedding on content changes
- Needs chunking strategy (section-level, not page-level)
- Hard to debug when results feel "wrong"
- Pagefind's fuzzy + full-text solves 80-90% of intent queries
Revisit criteria: If after 2-4 weeks of Pagefind being live, Clarity session recordings show users failing to find things, then evaluate Orama or Cloudflare Workers AI for semantic query rewriting.
Testing¶
Manual test queries that should return relevant results:
| Query | Expected Results |
|---|---|
| "what licence for copilot" | Licensing Simplifier, Copilot Matrix, blog posts |
| "deprecation exchange" | Deprecation Timeline |
| "how to write prompts" | Prompt Guide, Prompt Polisher, Prompt Library |
| "service health teams" | Service Health Tracker |
| "az-900 study guide" | Cert Tracker AZ-900 page |
| "migration planner" | Migration Planner tool |
Cost¶
$0 — Pagefind is open source (MIT), runs at build time, and the index is served as static files.