Brain Bar v1 — Lessons Learned¶
Captured 4 May 2026 at the end of the build session that took Brain Bar from idea (sparked by cmd.ms) to deploy-ready in a single working day. 35 entries, 3 distinct deployable surfaces (Hugo site, Cloudflare Worker MCP server, Pages Functions), 16 of 29 todos done in-session.
These are the lessons that should shape every future planet build (Shift, Plain AI, etc.).
What worked, and would do the same way again¶
1. Start with philosophy, not code¶
The single best decision of the session was pausing to write down the Cosmos Philosophy before any new visual layer touched the keyboard. Once "each planet is its own world" was a stated rule, every downstream choice — terminal aesthetic, separate Pages project, paid-content firewall, separate Cloudflare Worker for MCP — became obvious. No wasted code that had to be undone.
Apply to: any new product that crosses a domain or audience boundary.
2. Rubber-duck before implementation, not after¶
The pre-implementation rubber-duck pass on the v1 plan caught 3 blockers + 7 substantive non-blockers before any file was written:
- The original "same Pages project + host-rewrite Function" plan would have shared a failure domain with practice exams (a paid product). Rubber-duck pushed me to a separate Cloudflare Pages project. SLA is now structurally protected.
- The duplicate-content SEO problem (
cmd.aguidetocloud.com/mdevsaguidetocloud.com/cmd/mde) was caught before any URL was canonical anywhere. - The CORS issue with the Astro Guided embed was caught before a single fetch was written.
Apply to: every plan with cross-cutting decisions about hosting, routing, or data flow.
3. Hard-code firewalls, not trust¶
The MCP server has ONE allowed upstream URL, enforced both by a constant (INDEX_URL) and an allowlist check (ALLOWED_UPSTREAM_HOSTS) at fetch time. Even if I trust myself today, the next contributor (or me in six months) has a stop sign in the source code that says "do not add another fetch URL without verifying it points to free planet data only."
Apply to: any code that bridges paid and free surfaces.
4. Test the protocol surface, not just the happy path¶
The MCP server has 15 offline tests covering: HTTP routing, JSON-RPC error codes for unknown methods/tools, all 6 search ranking tiers, abbreviation case-insensitivity, missing slug paths, kind filtering, batched request handling. The wildcard middleware has 13 offline tests covering: apex pass-through, www stripping, exact slug, abbreviation, alias, old-name, unknown fallback, nested-label rejection.
This took ~30 minutes to write but caught a real data-quality bug (azure-ai-studio was duplicated across both aliases and old_names for Foundry — tier 3 was firing instead of the expected tier 4) that no amount of manual testing would have surfaced.
Apply to: any protocol implementation. Write tests against the contract, not the implementation.
5. Use the Playwright UX audit as a critic, not a status check¶
First Playwright audit caught: launcher input not auto-focused on page load (the single highest-impact UX fix of the session — turned a click-then-type interaction into a type-immediately one). Second pass: zero findings, after fixes.
The audit also caught: lede text in proper case bleeding the all-lowercase aesthetic, CSS class collision on the wishlist page (the bb-aliases class was reused for two different purposes and inheriting the wrong ::before content).
Apply to: every public-facing planet before deploy. Playwright + screenshot review > human eyeballing.
6. The "honest take" rule beats "always be confident"¶
Sush's voice rule — "neutral by default, spicy only when there is genuine conviction; silence > forced fit" — translated directly into a sush_take_status field with 3 values (spicy, neutral, omit). Of 35 entries, only 5 ship a spicy take. The rest are neutral or silent. Each spicy take means something because it isn't routine.
Apply to: every content layer. Pre-decide what voices the product will have, and what it won't.
7. Vendor data, don't fetch it cross-origin¶
The Astro Guided organ will vendor cmd-index.json at build time (when it ships) rather than fetch from cmd.aguidetocloud.com at runtime. Same-origin = no CORS, no third-party fetch, no runtime dependency on another planet being up. The freshness lag (one Astro deploy) is acceptable for an embedded launcher.
Apply to: any cross-planet UI integration. Build-time data > runtime fetch.
What I'd do differently¶
1. Start the visual identity conversation earlier¶
I burned a few minutes building the Brain Bar scaffold with Earth-derived Zen tokens (indigo accent, soft typography) before Sush corrected with the Cosmos Philosophy. The visual layer had to be rewritten. Cheap to redo, but avoidable.
Lesson: before scaffolding any new product's visual layer, explicitly ask: "is this Earth, or a new planet?" If new planet, ask Sush which atmosphere it should have first.
2. The 30°-hue rule has decayed past usefulness¶
With 56+ tools, the Earth tool registry has no remaining ≥30° hue gap. Brain Bar's phosphor (#34D399, hue 160) collides with Cert Study Guides' emerald (#10B981, also hue 160). I documented the duplicate as a "cosmic exception" because Brain Bar is on its own subdomain with its own CSS namespace — but the rule is no longer enforceable. Future colour additions will be increasingly arbitrary.
Lesson: when a registry rule fails to apply across the system, retire it explicitly. Don't keep adding "exceptions" — that's how rules become noise.
3. Hugo content adapters: use them, but the layout key isn't always honored¶
I tried to set params.layout = "disambiguation" in the content adapter and have Hugo route disambiguation entries to a custom disambiguation.html template. Hugo's lookup chain didn't pick it up — disambiguation entries kept rendering through single.html. After ~10 minutes of investigation, the simplest fix was to branch inside single.html on .Params.kind == "disambiguation". That worked instantly.
Lesson: if a Hugo template lookup is fighting you, branch in a single template rather than chase the lookup-order spec. It's almost always faster and the resulting code is easier to read.
4. Pages Functions vs Hugo dev server: known mismatch¶
The Hugo dev server (hugo server) doesn't run Cloudflare Pages Functions. So /api/log-miss and /api/popular-misses 404 in local dev. The Playwright audit picked these up as "console issues" (correctly). For a thorough local test, wrangler pages dev is the right tool — but it's slower to iterate than Hugo dev.
Lesson: document the known dev-server mismatches in the README. Don't try to make Hugo dev simulate Functions — accept the gap and run wrangler pages dev only when explicitly testing Function paths.
5. ESM TypeScript on Cloudflare Workers: skip the wrangler install for tests¶
First attempt to install wrangler for the MCP project failed because esbuild's post-install hook timed out on Windows ARM64. Wasted ~5 minutes. The fix was simpler: tests don't need wrangler at all (Node 22+ strips TypeScript via --experimental-strip-types). Only wrangler dev and wrangler deploy need wrangler — and those are deploy-time concerns, not test-time.
Lesson: decouple test dependencies from deploy dependencies. If tests can run with the runtime alone, do so. Save the heavyweight install for the deploy step.
6. Browser address-bar theme cookie scoping¶
The dark/light toggle cookie is scoped to .aguidetocloud.com so theme persists across www.*, cmd.*, and any future planet that respects it. This works in production but doesn't work on localhost or 127.0.0.1 (cookies can't be domain-scoped to non-public-suffix hosts). For local cross-port testing, the toggle has to be re-applied per port.
Lesson: a few rules become "production only" by definition. Document them explicitly so the local dev experience doesn't surprise the next engineer.
7. Deferring 'browse all' was the right call, but not by far¶
I almost shipped Brain Bar without an /all/ catalogue page, betting that the launcher alone would carry discovery. The Playwright audit + a moment of empathy for first-time visitors (who might not realize the input is interactive) revealed that some users want to see the full breadth before they search. Adding /all/ took 20 minutes and noticeably improves the "is this a serious product?" first impression.
Lesson: for any catalogue-style product, build a browse surface alongside the search surface. Even if search is the dominant interaction, browsability is part of trust-building.
Patterns to reuse for the next planets¶
| Pattern | Reusable for |
|---|---|
| Two-file data model (entity layer + voice overlay) | Shift's role pages (entity = role data; overlay = Sush takes) |
| Hugo content adapter generating pages from TOML at canonical URL | Any data-driven product |
| 6-tier deterministic search with no auto-open on fuzzy | Any launcher-style UX |
| Build-time validator with N rules and ANSI-coloured output | Any growing dataset |
| Cloudflare Worker + JSON-RPC MCP protocol implementation (zero npm deps) | Any planet that wants AI-agent reach |
| Wildcard subdomain pattern via Pages Function (host-header rewrite) | Any short-URL muscle-memory product |
| Playwright UX audit script with structured findings JSON | Pre-launch QA on any planet |
| Privacy-safe missed-term tracking with D1 + graceful fallback | Any "what aren't we covering yet?" insight |
| Paid-content firewall (allowlist + load-time check) | Any code that bridges paid and free surfaces |
| Mode chips above one prompt + smart-suggest fallback | Any launcher that needs to handle two-or-more content kinds without breaking the sole-prompt brand |
| Result-card severity glyph instead of mode chrome | Any product where alerting must be visible without recolouring the whole atmosphere |
v2e — errno mode (5-6 May 2026)¶
The decoder ride-along. Brain Bar gained a sibling content kind — Microsoft error codes — without breaking its sole-prompt terminal brand. Three lessons came out of this round.
12. The right time for "loud discoverability" is when copy alone fails¶
Round 1 of errno mode shipped /decode/ as a standalone subsection with one buried // decode error link in the home boot block. Sush bounced almost immediately: "i dont see docoder at all but this link above is working?"
Audit confirmed five home-page touchpoints (hero prompt, hero title, hero tagline, search placeholder, boot-block "type" instruction) all said "jargon" or "Microsoft term" only — zero mention of error codes. The single buried link couldn't carry it. The original recommendation ("polish the copy") was an under-shoot.
The fix: mode chips above one prompt — [ $ jargon ] [ ! errno ] — both modes visible at first glance with a tiny caption explaining what each covers. Click swaps placeholder + index + result tint + boot block in place. URL stays /. Sole-prompt terminal brand intact.
Lesson: when a feature has zero discoverability through copy alone, "polish the copy" is a real option but a weak one. Add a visible affordance that doesn't require reading. Mode chips are the loud-but-on-brand version of "two search boxes" — the visual signal is in the chip colours, not in cloning the input.
Apply to: any planet that earns a second content kind. Don't bury it in nav. Don't trust placeholder text. Make it visible.
13. node -c is not enough — TDZ is a runtime error¶
The Round-2 commit shipped cmd.js with a temporal-dead-zone error: const STATE = { mode: MODE_JARGON, ... } was declared BEFORE const MODE_JARGON = 'jargon'. The whole module threw at init, breaking chips, search, smart-suggest, and recents — everything on the home page that depended on cmd.js.
The pre-deploy node -c check passed. TDZ is not a parse-time error; it's runtime. A real page load was needed to catch it.
The fix: re-ordered the const declarations + committed qa/decode-chip-diagnostic.mjs as a permanent regression test. The diagnostic launches Playwright, captures pageerror events, and asserts that clicking the errno chip actually swaps mode + placeholder + boot blocks.
Lesson: for any non-trivial JS change, run a real page load — not just a syntax check. Hugo dev server + Playwright takes 60 seconds; it would have caught this before deploy.
Apply to: every cmd.js / decode.js / interactive-launcher change. Make the diagnostic part of the deploy checklist.
14. Severity = glyph, not chrome¶
Sush's first instinct for the decoder was a separate red command bar (or split window, or theme toggle). Each of those breaks the Cosmos rule "differentiate by feature, not chrome" — they all pour colour over the input and the page.
The shipped design uses red as a stderr-style severity glyph: top accent strip on the result card, ERROR pill with namespace, red code in result-row slug column. The page chrome stays Brain Bar green; the prompt itself is unchanged in both modes.
Result: the /decode/ page is unmistakably a different content kind, but it still feels like Brain Bar. A real terminal does the same thing — ls output and stderr lines coexist in the same shell.
Lesson: when a product gains a content kind that needs alerting (errors, warnings, deprecations), use colour as a glyph on the content, not as a theme on the chrome. The brand stays whole; the alert is still loud.
Apply to: any product with mixed-severity content. Earth's blog with deprecation alerts, Shift's role pages with risk markers, Plain AI's "advisory" content — same pattern.
The non-obvious win¶
The most underrated outcome of this session is the Cosmos Philosophy. Without it, every future product would gravitationally pull toward Earth's design — and the result would be a sterile system of sub-Earths, none with their own identity. With it, each planet earns its own gravitational signature.
Brain Bar's "Terminal" atmosphere isn't just a stylistic choice — it's a proof that the philosophy works. The next planet (Shift, Plain AI) will pick a different atmosphere and the product universe gets richer, not noisier.
The errno-mode round (v2e) extended this idea within a single planet: the same atmosphere can host multiple content kinds, with severity glyphs handling the differentiation that chrome would over-do.
This file is a living document. Update it after each planet build with what worked and what didn't.
🌱 cmd v2 Launch — 7 May 2026¶
Brand renamed: Brain Bar → cmd. The acronym tool is now
cmd(live atcmd.aguidetocloud.com). Filename of this doc kept asbrain-bar-lessons.mdfor grep continuity — the historical lessons above still apply, the playbook below covers the v2 reimagining.
What changed in one paragraph¶
Brain Bar shipped as a search-box-with-terminal-styling: type a Microsoft
acronym, get a card, click a card, land on an SSR page. cmd v2 reimagines
the homepage as a real terminal session inside a chunky CRT monitor frame.
Type a command, results stream into a scrollback buffer. Pipes work
(mde | history). Click any output to re-run. Permalinks via ?q=.
Cream desk + dark CRT screen contrast. Boot sequence (skippable, cookie
gated, prefers-reduced-motion aware) with ASCII logo + [BIOS]/[OK]/[READY]
streaming. All entry/decode/about/watch SSR pages preserved (still indexable,
still served by the existing cmd.js + 9-tier ranker — completely untouched).
Naming decision (locked)¶
- Brand:
cmd(notBrain Bar, notbrainbar) - URL:
cmd.aguidetocloud.com— brand collapses into URL = single identity - MCP tools:
cmd_search,cmd_get,cmd_list_kinds(wasbrainbar_*) - Why
cmd: terminal-native, sysadmin-coded, three letters, ungoogleable but the URL + tagline + entry-page slugs do the SEO heavy lifting - Folder paths NOT renamed:
brainbar/,brainbar-mcp/stay on disk (internal-only, doesn't affect users — defer to natural maintenance moment)
Architecture — what powers what¶
| Surface | File(s) | Rendering |
|---|---|---|
Homepage / |
layouts/index.html + static/js/cmd-terminal.js |
Terminal mode (CRT, boot, commands, pipes) |
Entry pages /<slug>/ |
layouts/_default/single.html + static/js/cmd.js |
SSR + 9-tier search ranker (untouched) |
Decode pages /decode/<code>/ |
layouts/decode/single.html + static/js/decode.js |
SSR (untouched) |
/about/, /watch/, /changelog/, /all/ |
layouts/_default/{about,watch,changelog}.html |
SSR (brand text rebranded) |
| Search index | layouts/index.cmd_index.json → /cmd-index.json |
Hugo data adapter — extended additively (added portal_url, learn_url, plans, official, watch fields) |
| MCP server | brainbar-mcp/ (Cloudflare Worker) |
Standalone deploy via wrangler deploy |
Key separation: cmd-terminal.js runs ONLY when .cmd-v2-home exists in
the DOM. cmd.js (the mature ranker) bails cleanly when its targets don't
exist. Both files coexist; only one runs per page.
CSS scoping: All v2 styles scoped under .cmd-v2-home so they cannot
leak into entry/decode/about/watch SSR pages.
Terminal commands shipped¶
search <term> layered match (slug · abbreviation · alias · old_name · synonym · prefix · substring)
man <slug> full entry as a man-page (renders portal_url + learn_url + plans + watch)
ls [kind] list jargon · errno
decode <code> resolve error code (AADSTS, 0x HRESULT, KB)
compare a b side-by-side diff (curated for e3-vs-e5; auto-fallback shows both man pages)
history this session's commands
whoami your search trail (localStorage cmd:terminal:trail)
clear clear buffer
cowsay <text> one easter egg
help · ? help-as-cards inside the terminal
about about cmd + MCP server + feedback link
watch upcoming Microsoft rebrand watchlist
changelog what cmd shipped recently
all alias for `ls jargon`
Pipes:
- <cmd> | history — uses entry.watch field as the rebrand-watch note
- <cmd> | grep <term> — substring filter on output
Smart suggest:
- Error pattern → suggests decode <code> (regex: 0x[hex]+, AADSTS\d+, KB\d+)
- Two known slugs → suggests compare a b
- Tab completes from union of verbs + slugs + error codes
Visual design — the cream + dark CRT decision¶
The homepage uses a warm cream desk (#f4ede0) with subtle warm brown
dot grid (rgba(140,90,50,0.18) 1px @ 36px spacing) + faint full-page
scanlines (CRT-room atmosphere). The CRT terminal sits on the cream as
a piece of hardware — chunky charcoal bezel (linear gradient
#3d4250 → #1c1f27), screen recess (inset shadow + black ring), pulsing
green power LED bottom-left, model label "cmd 4000 · retro · © 1996"
bottom-right.
Inside the screen: pure black #06080c, scanlines @ 2.2% opacity, radial
vignette dimming corners, phosphor glow (text-shadow) on green prompt + amber
headings + LED. CRT flicker animation every 47s (very subtle, ~85% opacity for
1 frame).
Why cream not dark: the dark-on-cream contrast makes the terminal POP. It reads as "real hardware on a desk" not "a dark UI on a dark page". Plus the brown text on cream is more readable for long copy. Theme toggle hidden on homepage — we own the look.
Mobile (<720px): CRT decorations hide (bezel/LED/label become flat panel), ASCII logo hides, terminal becomes 360px tall with normal border, express search auto-opens.
Audit-driven fixes (Playwright + WCAG)¶
Ran a comprehensive audit using a custom Playwright script. Found 4 WCAG AA
contrast failures (browns on cream below 4.5 ratio), missing <nav>
landmark, no skip link, AGTC logo had empty alt, title was just
"cmd" (ungoogleable for SEO), ASCII logo broken on mobile, cosmos rail
visible on home despite CSS.
Fixed in three batches across one session:
- v2.2 (P0): Contrast → #5a3a18, skip-link, <nav class=bb-topnav>, AGTC alt, SEO title rewrite to cmd — Microsoft jargon & error code decoder · stop googling microsoft, ASCII hidden mobile, cosmos rail force-hidden
- v2.3 (P1): Font preload (Regular + Bold), aria-live="polite" on buffer, aria-hidden="true" on boot decoration, mobile auto-open <details>, 55 OG PNGs regenerated for cmd brand
- v2.4 (P2): Featured-entries strip below terminal (8 server-rendered cards: mde, pim, m365-e3, m365-e5, intune, entra-p1, m365-copilot, conditional-access) — SEO crawlable, no-JS reachable
Deployment history (what shipped, when)¶
| Commit | Tag | What |
|---|---|---|
db8bf4a |
v2.0 merge | Initial port — terminal homepage, CRT casing, basic commands |
685f5f2 |
v2.0 hotfix | Wider terminal (max-shell 1080→1400), about/help nav, fix nested <main> |
35315e0 |
v2.1 | Cream homepage, brand audit (19 files), MCP rename in repo, curated compare e3 e5 |
3352597 |
v2.1.1 | Brand audit completes — clean data files (changelog/decode/voice/entries TOMLs) |
a7a2737 |
v2.2 | Audit P0: a11y + SEO + contrast |
bf8302e |
v2.3 | Audit P1: font preload + a11y boot + mobile expand + OG regen |
abcab07 |
v2.4 | Audit P2: featured-entries strip |
cache_version: 2026050602 → 2026050708 (8 bumps in one session)
MCP server (mcp.aguidetocloud.com)¶
Cloudflare Worker, separate deploy from Pages. Source: brainbar-mcp/.
Tools (renamed):
- cmd_search(query, limit?) — search by slug/abbrev/alias/old-name/prefix/substring (9-tier ranker)
- cmd_get(slug) — fetch one entry's full record. Resolves aliases.
- cmd_list_kinds(kind?) — group by product/license/feature/portal/cert/tool/disambiguation
Deploy:
Tests: npm test (mocha). All assertions reference cmd_* names.
Files that matter (next-session quick reference)¶
brainbar/
├── hugo.toml # title="cmd", cache_version, baseURL
├── data/
│ ├── cmd_entries.toml # 55+ jargon entries (single source of truth)
│ ├── cmd_decode.toml # 47+ error codes
│ └── cmd_voice.toml # Sush takes (per-slug)
├── layouts/
│ ├── index.html # ⭐ NEW homepage (CRT + featured strip)
│ ├── index.cmd_index.json # JSON adapter (extended fields)
│ ├── _default/
│ │ ├── baseof.html # Skip link, <nav>, theme toggle gated
│ │ ├── single.html # Entry page (SSR — untouched)
│ │ └── about.html, watch.html, ... # SSR pages (brand-rebranded)
│ ├── decode/single.html, list.html # Decode SSR
│ └── partials/planet-icon.html # AGTC alt fixed
├── static/
│ ├── css/cmd.css # 2375 lines now — v2 styles scoped under .cmd-v2-home
│ ├── js/cmd-terminal.js # ⭐ NEW (~36 KB) — terminal-mode JS
│ ├── js/cmd.js # Mature 9-tier ranker (untouched, powers entry pages)
│ ├── js/decode.js # Decode page JS
│ └── og/*.png # 55 entry OGs regenerated for cmd brand
├── scripts/
│ └── generate-og-images.mjs # Brand updated to "cmd" on line 179
└── functions/ # Cloudflare middleware (alias redirects)
brainbar-mcp/ # Cloudflare Worker (separate deploy)
├── src/index.ts # Tool names cmd_*
├── test/index.test.mjs # Tests assert cmd_* names
└── README.md # API ref updated
What's next (capability roadmap — for future sessions)¶
The visual + a11y + SEO + brand work is DONE. Next sessions = capability expansion:
-
More curated
comparepairs — Entra P1 vs P2, M365 F3 vs Business Premium, E3 + Copilot vs E5, etc. Each curated pair has aCURATED_COMPARESentry incmd-terminal.jswith cols + rows + pricing note. -
Dynamic
compareschema-builder — when no curated pair exists, auto-build a diff table by walking key fields (kind, domain, plans, portal_url, learn_url, old_names, watch). Already partly there as fallback (shows both man pages). -
Better synonym coverage — every entry's
terms.synonymsis curated; could auto-extract fromaliasesand Sush takes. -
watchdata feed — currently each entry has awatchfield for rebrand status. The/watch/page lists entries withwatch != "". Could become a structured rebrand-watch system with dates + sources. -
changelogautomation — auto-generated from git commits with brand-tagged commit messages? Or stay curated. -
Permalink expansion —
?q=mde&fs=1could auto-fullscreen.?q=could support deep-linking to specific output (not just last command). -
Analytics-light —
/api/log-missalready logs missed searches. Could add/api/log-popfor popular queries (helps prioritize new entries). -
MCP v0.2 tools —
cmd_compare,cmd_recent_changes,cmd_watch, possiblycmd_by_domain. Each is a 1-tool addition tobrainbar-mcp/src/index.ts. -
Voice integration —
cmd_voice.tomlhas Sush takes per slug. Currentlyhas_takeboolean is exposed in JSON; the take itself isn't surfaced in terminal mode. Could add as a// sush take:line at bottom ofman <slug>when present.
Universal cosmos laws (still apply)¶
- 🚦 Don't collide with paid practice exam (own CF Pages project — already isolated)
- 📡 Publish data feed (
/cmd-index.jsonconsumed by Astro Guided organ at build time) - 🔗 Brand attribution (footer link to aguidetocloud.com universe)
- 🍪 Theme cookie parity (
.aguidetocloud.comdomain) — but cmd home now ignores it (we own the look)
Lessons from this session¶
- Rubber-duck before merging two paradigms. I almost wholesale-replaced
cmd.js(mature 9-tier ranker) before realising it should stay for entry pages. The duck caught it. Took 2 minutes of investigation to pivot to a parallel-file approach (cmd-terminal.js + cmd.js coexist). - Audit BEFORE you think you're done. Playwright + WCAG check exposed 4 contrast failures, missing landmarks, mobile breakage that visual review missed. Three batches of fixes after "shipping" produced the actual ship.
- Cream + black contrast > all-dark. The CRT-on-desk metaphor lands harder than dark-on-dark. People see real hardware, not just a UI.
- Server-render some content above the fold for SEO. The featured strip (8 cards) gives crawlers + no-JS users actual content. The terminal alone is invisible to bots.
- Cosmos doc + reference docs need updating in same session as the rename. Otherwise future-Sush (or future-AI) loses the breadcrumbs.
- No-users phase = freedom to break. Sush's "we are just building our planet" framing unlocked clean-rename + multiple deploys-per-day. This becomes much harder once people depend on the URLs/MCP names.
SME + code + UX overhaul session (8 May 2026, overnight)¶
Captured at end of an overnight session where Sush asked for a comprehensive SME content review + code-quality review + UX visual test in light AND dark mode at three viewports, then fix everything found. Result: 27 fixes shipped, 0 user-visible regressions, 136 tests still green, plus a reusable Playwright UX probe (
brainbar/ux-probe.mjs) and a documented playbook (cmd-content-review-playbook.md).
What was found and how¶
Process:
1. Background rubber-duck agent (gpt-5.5, 8 min) ran an SME+code review across cmd_entries.toml, cmd_decode.toml, cmd_skus.toml, cmd_voice.toml, cmd-terminal.js, cmd-pure.mjs, brainbar-mcp/src/index.ts. Returned 27 findings tagged by severity (blocker / substantive / advisory) and category (content-accuracy / currency / code-quality / typo).
2. Self-built Playwright UX probe (ux-probe.mjs) ran 3 viewports × 2 themes × 7 pages + 22 verb outputs ≈ 140 screenshots with WCAG contrast + horizontal-overflow + console-error checks. Returned 5 substantive UX issues plus a long tail of false positives.
3. Logged everything in SQL review_findings table, categorised, attacked by severity. Triaged false positives (alpha-blind contrast checker, hover-state false reads, parent inheritance reads).
The blocker that nearly hid:
- loadDecode() was a fire-and-forget .catch(() => {}) with STATE.decodeReady left at false on failure. Callers (cmdDecode, cmdLs('errno')) auto-rerun on decodeReady === false — so a 404 on /decode/decode-index.json causes infinite self-rerunning. Fix: in-flight promise dedupe + sticky decodeError state. Ship one error message instead of an infinite loop.
The pattern lesson — content audit needs SME, not just structure:
- Validators (validate-cmd-tree.mjs) check structural integrity (every slug exists, every edge is bidirectional). They CAN'T check whether m365-e3.includes correctly claims MDO is bundled — because that's a Microsoft-licensing-trivia question. Only an SME (or rubber-duck simulating one) catches this. Validators must coexist with SME passes; one is not a substitute for the other.
- 13 of 27 findings (almost half) were content accuracy: wrong includes graphs (m365-e3 claimed MDO; m365-f3 claimed MDE — both false), wrong fix paths (0x80180014 was generic "data invalid"; correct cause is MDM enrolment-restriction policy), wrong cross-references (KB5034441 surfaces as 0x80070643, NOT 0x800F0922), wrong SKU GUIDs, blank price placeholders, wrong price math in sush_take. Each of these would have been "Microsoft people will spot this immediately" embarrassments.
The pattern lesson — prompt examples teach the wrong content too:
- The MCP cmd_ask system prompt had a few-shot example claiming MDE Plan 1 is bundled in m365-f3. Even after fixing the data file, the few-shot example would have re-taught the LLM the same wrong answer. Lesson: prompt examples ARE content. Treat them as data and audit them on every content pass.
The pattern lesson — alpha-blind contrast checkers lie:
- My probe's parseRgb() regex matched rgba?\((\d+)\D+(\d+)\D+(\d+)/ and ignored the alpha channel. So rgba(245, 166, 35, 0.10) showed as solid #F5A623 — completely wrong perceived background. This produced ~10 false positives on cards/pills with translucent backgrounds.
- Either (a) build a proper alpha-aware compositing checker, or (b) treat your probe's findings as a first pass that needs visual confirmation. Don't fix what the eye can read just because a robot says it's broken.
The pattern lesson — list-of-strings vs string in renderers:
- cmd_decode.toml authored fix as a YAML/TOML array of strings (the right schema for a list of steps). The terminal's renderBlock(errno) did esc(e.fix || e.fix_path || '') which collapses an array via Array.toString() (CSV) and renders as a single comma-separated line. Subtle data-shape leak.
- AND: the entry schema-definer (list.decode_index.json) was only exposing the summary fields (code, slug, namespace, short, plain_english, seen_in) — it stripped fix and likely_cause entirely. Terminal decode showed fix: — for every code. Fix needed at TWO points — both the schema slice and the renderer.
- Lesson: any time a renderer shows "—", check both the renderer AND the data shape upstream. The renderer might be honest about empty data; the upstream might be silently dropping it.
The pattern lesson — generic-before-specific regex always loses:
- GRAPH_SCOPE_MAP had /me, /users, /security/ as broad catch-alls. Specific routes like /me/messages (needs Mail.Read) lost to /me (User.Read) every time, because the broad regex matched first. Re-ordered: most specific first (workload-scoped routes), broadest last (catch-all identity). Rule: regex dispatch tables must be authored most-specific first or the result is "first match wins on the wrong scope".
The pattern lesson — "the count says 0 because the column is empty in JSON" cascades:
- I removed mde from m365-f3.includes. The validator (which only checks structural integrity) said "✓ clean — 62 edges, 0 issues". Tests passed. But tree m365-f3 would have rendered with one fewer line. Cosmetic but the kind of regression an end-user notices ("hey didn't this used to show MDE?"). Lesson: data-removal changes need a manual visual after-test even when validators pass. The Playwright UX probe caught this incidentally because we re-screenshotted everything.
Reusable artefacts¶
-
brainbar/ux-probe.mjs— Playwright UX probe with WCAG contrast checks, full-page screenshots in 3 viewports × 2 themes, plus terminal-verb output capture. Run withnode ux-probe.mjsagainstlocalhost:1315. Outputsux-screenshots/folder +_findings.json. Reuse pattern: copy this script to other planets (Shift, Plain AI) — changeBASE,PAGES, andVERBSarrays. -
learning-docs/docs/reference/cmd-content-review-playbook.md— the step-by-step runbook for repeating this review cycle next quarter (or applying it to other planets). Includes the SQL schema forreview_findings, the rubber-duck prompt template, the Playwright probe pattern, the false-positive triage rules, and the "content fix → cache bump → test → deploy" deploy ladder. -
brainbar-mcp/scripts/deploy-via-api.mjs(env-token version) — deploy via REST API, readsCLOUDFLARE_API_TOKEN+CLOUDFLARE_ACCOUNT_IDfrom env (argv only as fallback). Survives the brokenwranglerinstall on Win ARM64. Reusable for any CF Worker deploy.
What I'd carry into the next planet¶
- Run rubber-duck SME passes BEFORE the visual review, not after. Content accuracy issues cause re-deploys; visual issues just cause re-deploys of CSS. Order: SME → code → visual.
- Audit the audit tool itself before you trust it. My contrast probe's alpha bug produced 10+ false positives that would have wasted hours if I'd taken them at face value.
- Build the SQL
review_findingstable early — it's the difference between "I think I fixed everything" and "I have a queryable list of N items, M done, N-M deferred, with severity/category breakdown". - Update the journal AND the reference docs in the same commit. Future-me reads the journal first; future-AI reads the reference docs first; they have to agree.
Test discipline that paid off¶
- Every wave (content/code/UX) ran the full 51 + 25 + 60 = 136 tests before moving on.
- Hugo build clean check after every batch (catches data-shape errors that JS tests miss).
validate-cmd-tree.mjsre-run after every content edit (catches danglingincludesslugs).- Live
/askcurl after MCP deploy (the deploy-success boolean is necessary but not sufficient — confirm the actual answer changed). - CI conclusion check on the resulting GitHub commit before considering "done".
The discipline is the deliverable. A clean record of "all green, all the way through" is what lets the next session move quickly without paranoid re-verification.