Skip to content

🔍 Pagefind Search Integration

Built: 2026-04-12 (Session 2)
Status: ✅ Live
Replaces: Custom search.js (substring matching on 261 entries)


What It Is

Pagefind is a Rust-based static search engine that indexes the full HTML output of the Hugo site at build time. It replaces our old custom search that only matched substrings in titles and descriptions.

Before vs After

Aspect Old Search Pagefind
Indexed content Titles + descriptions only (261 entries) Full page text (895 pages, 7,117 words)
Matching Substring only Fuzzy, stemming, multi-word
Index size 112KB loaded upfront ~5KB JS + fragments loaded on-demand
Body text searchable
Offline
Maintenance Manual index template Auto-indexes HTML output

How It Works

  1. Hugo builds the site (hugo --minify) → outputs HTML to public/
  2. Pagefind runs (npx pagefind@latest --site public) → indexes all <main data-pagefind-body> content
  3. Creates compressed index fragments in public/pagefind/
  4. Browser loads pagefind-ui.js (117KB) + fetches index fragments on-demand when user searches

Key Implementation Details

data-pagefind-body on <main>

<!-- baseof.html -->
<main id="main-content" data-pagefind-body>
⚠️ Critical: When ANY element has data-pagefind-body, Pagefind ONLY indexes elements with that attribute. Adding it to <main> ensures all page content is indexed site-wide.

data-pagefind-ignore on nav/footer

<nav class="site-nav" data-pagefind-ignore>
<footer class="site-footer" data-pagefind-ignore>
Prevents nav menu items and footer links from polluting search results.

Hidden searchable content for JS-rendered pages

7 tools render their main content via JavaScript (empty HTML shells at build time). Pagefind can't index JS-rendered content, so we add hidden searchable blocks:

<!-- Example: AI News template -->
<div data-pagefind-body style="display:none" aria-hidden="true">
  <h1>AI News — Real-Time AI Industry Updates</h1>
  <p>Track the latest AI news from Microsoft, OpenAI, Google, Anthropic...</p>
  <p>Features: daily, weekly, monthly views. Category filtering...</p>
</div>

Pages with hidden blocks: AI News, M365 Roadmap, Service Health, Deprecation Timeline, Cert Tracker (list), Copilot Matrix, Feedback, Site Analytics.

Files

File Purpose
layouts/_default/baseof.html Search modal with Pagefind UI container, data-pagefind-body on <main>
static/js/search-init.js Pagefind UI initialisation, Ctrl+K shortcut, modal open/close
static/css/search-overrides.css Dark glassmorphism theme for Pagefind UI
static/js/search.js DELETED — old custom search replaced by Pagefind
.github/workflows/deploy.yml Added npx pagefind@latest --site public step after Hugo build

Build Pipeline

# deploy.yml
- name: Build site
  run: hugo --minify

- name: Build search index
  run: npx pagefind@latest --site public

- name: Deploy to Azure Static Web Apps
  uses: Azure/static-web-apps-deploy@v1

Pagefind runs AFTER Hugo and BEFORE deploy — it adds the pagefind/ directory to public/.

Dark Theme Overrides

search-overrides.css customises the Pagefind UI to match the site's dark glassmorphism:

  • Input: Dark background, cyan border on focus, blur backdrop
  • Results: Glass cards with subtle border, cyan title links
  • Highlights: Cyan mark elements for matching terms
  • Sub-results: Indented with cyan left border
  • Modal: Full-screen overlay with blur backdrop, 640px max-width

Synonyms (Migrated from old search.js)

The old search had a synonym map (~30 terms like m365microsoft 365). Pagefind doesn't support custom synonyms natively, but its fuzzy matching and stemming handle most cases:

  • "m365" finds "Microsoft 365" via fuzzy match
  • "copilot" finds all Copilot-related pages via full-text
  • "licence" finds "license" via stemming

If specific synonyms are needed in future, Pagefind supports custom indexing configuration.

Semantic Search (Deferred — Phase 2)

A rubber-duck critique recommended deferring semantic search (Orama + build-time embeddings) because:

  1. Needs build-time scripting beyond normal Hugo
  2. Needs API key handling + re-embedding on content changes
  3. Needs chunking strategy (section-level, not page-level)
  4. Hard to debug when results feel "wrong"
  5. Pagefind's fuzzy + full-text solves 80-90% of intent queries

Revisit criteria: If after 2-4 weeks of Pagefind being live, Clarity session recordings show users failing to find things, then evaluate Orama or Cloudflare Workers AI for semantic query rewriting.

Testing

Manual test queries that should return relevant results:

Query Expected Results
"what licence for copilot" Licensing Simplifier, Copilot Matrix, blog posts
"deprecation exchange" Deprecation Timeline
"how to write prompts" Prompt Guide, Prompt Polisher, Prompt Library
"service health teams" Service Health Tracker
"az-900 study guide" Cert Tracker AZ-900 page
"migration planner" Migration Planner tool

Cost

$0 — Pagefind is open source (MIT), runs at build time, and the index is served as static files.