Skip to content

cmd Content Review Playbook

What this is: The step-by-step runbook for a comprehensive SME content review + code-quality review + UX visual test of cmd (and any "cosmos planet" with the same shape — Shift, Plain AI, etc.). Born from the 8 May 2026 overnight overhaul that found 27 issues spanning content accuracy, code quality, UX, and security.

When to run it: - Quarterly, on every cosmos planet that holds factual content (cmd, Shift, Plain AI). - Always before a marketing push or external talk — embarrassment risk = SME catches it before they do. - Always after a major content batch (10+ new entries / decode codes / etc.) — bulk content is where errors compound silently.

Cost: ~3 hours wall clock on a quiet machine. Most of that is rubber-duck thinking time + screenshot rendering. Hands-on edit time is ~45 min.


Phase 0 — Stage the review

0.1 Spin up dev server with cache busting on

cd C:\ssClawy\aguidetocloud-revamp\brainbar
hugo server --port 1315 --disableFastRender
Why --disableFastRender: the UX probe hits localhost:1315 directly. Fast-render serves stale HTML for ~150ms after a data file change → false UX findings. Disable it.

0.2 Create the SQL findings table

CREATE TABLE IF NOT EXISTS review_findings (
  id TEXT PRIMARY KEY,
  source TEXT,                  -- 'duck' | 'probe' | 'manual'
  severity TEXT,                -- 'blocker' | 'substantive' | 'advisory'
  category TEXT,                -- 'content-accuracy' | 'currency' | 'code-quality' | 'typo' | 'contrast' | 'overflow' | 'console-error'
  finding TEXT,                 -- short description
  status TEXT DEFAULT 'pending' -- 'pending' | 'done' | 'deferred' | 'false-positive'
);
Why: without a queryable list, "I think I fixed everything" is the best answer you can give. With it: SELECT severity, count(*) FROM review_findings GROUP BY severity is your dashboard.


Phase 1 — SME + code review (rubber-duck agent)

1.1 Launch in background

Use the rubber-duck agent in mode: background with model: gpt-5.5 (Claude Opus also fine; gpt-5.5 is faster and tends to find more SME slips).

1.2 The prompt template that worked

You are an SME reviewer for the cmd Microsoft jargon decoder
(https://cmd.aguidetocloud.com).

Audit ALL these files for:
  • content-accuracy issues (data is wrong)
  • currency issues (data was right, is now stale)
  • code-quality issues (bugs, blockers, race conditions, security)
  • typos / voice mismatches

Files to audit:
  - brainbar/data/cmd_entries.toml
  - brainbar/data/cmd_decode.toml
  - brainbar/data/cmd_skus.toml
  - brainbar/data/cmd_voice.toml
  - brainbar/static/js/cmd-terminal.js
  - brainbar/static/js/cmd-pure.mjs
  - brainbar-mcp/src/index.ts

For each finding, return:
  • Numbered ID
  • Severity: blocker | substantive | advisory
  • Category: content-accuracy | currency | code-quality | typo
  • What: 1-2 sentence description
  • Where: full file path + line number range
  • Why it matters: 1 sentence
  • Recommended fix: actual replacement text or pseudocode

Be ruthless about Microsoft licensing trivia (which SKU includes which
service plan, GUID accuracy, plan-tier alignment). IT pros will spot
those instantly.

Do not invent issues. If you find none in a section, say "no issues
found in X" so we know what was checked.

1.3 Categorise findings into the SQL table

The agent returns ~25-30 findings in markdown. Insert each as one row. Use stable IDs (rd-01, rd-02, ...) so you can reference them in commit messages and future docs.


Phase 2 — UX visual probe (Playwright + WCAG)

2.1 The probe script (canonical: brainbar/ux-probe.mjs)

Already in the repo. Key parameters: - Pages: home (no theme toggle — locked Compaq desk), entry pages, decode list, decode entries, /about/, /all/ - Viewports: 360 mobile, 768 tablet, 1440 desktop - Themes: light + dark for theme-toggle pages; default-only for atmospherically-locked pages (home) - Verb captures: 22 representative terminal commands (samples, tree, why, freshness, decode handlers, pipes, etc.)

2.2 Run it

Remove-Item -Path "$env:USERPROFILE\.copilot\session-state\<sid>\files\ux-screenshots\*.png" -Force
cd C:\ssClawy\aguidetocloud-revamp\brainbar
node ux-probe.mjs
~140 screenshots end up in the session folder. _findings.json summarises WCAG + overflow + console-error findings.

2.3 Triage — the false-positive checklist

Every contrast finding needs visual confirmation before you fix it. The probe's parseRgb regex strips alpha — translucent backgrounds are read as solid. Specifically suspect: - rgba(*, *, *, 0.x) cards/pills → likely false positive - :hover states (probe sometimes captures them mid-transition) - Parent <a> element with default link color where children have explicit colors - The .bb-keyword style (green-on-green-soft) — visually fine, probe complains

Rule: if the screenshot looks readable, drop it as a false-positive row in SQL. Don't waste time fixing perceived contrast that the eye says is fine.

2.4 Real findings to act on

  • Same color codes flagged on multiple pages → it's a token, fix once in tokens.css
  • Single-page contrast finding → likely a hardcoded color in cmd.css (search for hex codes near the location)
  • Horizontal overflow → almost always a long pasted input (resource ID, GUID, URL) breaking out of a flex container without min-width: 0 or text-truncation
  • Console errors → reproduce manually, then fix; do NOT ship a deploy with console errors

Phase 3 — Fix waves (in this order)

Wave 1: Content accuracy first

Why first: content fixes need a Hugo rebuild; that's slow. Batch them. AND content fixes don't depend on code changes — but code fixes might depend on data shape.

Common content fixes for cmd: - Wrong includes[] claim in a license entry → SME-grade Microsoft licensing trivia, cross-check with learn.microsoft.com/entra/identity/users/licensing-service-plan-reference - Wrong fix array in a decode entry → cross-check with the official Learn troubleshooting page (link is in learn_url) - Wrong SKU GUID → cross-check Microsoft's licensing reference table (linked from cmd_skus.toml header comment) - Blank price placeholder (USD /user/month with nothing between $ and /user) → fill or remove - Wrong price math in cmd_voice.toml Sush-takes → multiply, don't trust

Hidden gotcha: the MCP cmd_ask system prompt has few-shot examples that teach the model what an answer looks like. If the example references the wrong includes, the model will repeat it. Treat the prompt as content; audit it on every content pass.

Wave 2: Code quality

  • Blockers first — anything that can spiral (infinite loops, unbounded fan-out, no abort on hang)
  • Then security — CORS, rate limits, secrets in argv
  • Then bug-fixes — array vs string rendering, regex order, missing error states
  • Then small leaks — interval cleanup, stale-promise rejection paths

The pattern: every code fix needs a mirror update if the helper exists in both cmd-pure.mjs and inline in cmd-terminal.js. Drift between the two is a real problem (51 unit tests can pass while production browser behaviour is wrong). Either generate the inline block from the canonical module, or audit the mirror manually.

Wave 3: UX visual

  • Token bumps in tokens.css (--text-dim, --text-muted) — apply to both light + dark blocks
  • Hardcoded colours in cmd.css (the home-page CRT atmosphere uses Dracula colours, not tokens)
  • Component-specific bumps (status-bar .lbl, decode-list meta, all-list tag chips)
  • Layout fixes (CRT title bar truncation for long pasted inputs)

Cache-bust discipline: every CSS / JS / data change requires bumping cache_version in brainbar/hugo.toml. The site has cache-busted asset URLs everywhere — but only if you bump the version.


Phase 4 — Verify

4.1 The 5-test gate (all must pass before commit)

# 1. JS syntax
node -c brainbar/static/js/cmd-terminal.js

# 2. cmd-pure unit tests (51 tests, ~250ms)
cd brainbar; node --test test/cmd-pure.test.mjs

# 3. Validator (data integrity, 0 issues required)
cd brainbar; node scripts/validate-cmd-tree.mjs

# 4. MCP tests (25 tests, ~370ms)
cd brainbar-mcp; npm test

# 5. Playwright tier-1 diagnostic (60 tests, ~30s; needs dev server)
cd brainbar; node qa/cmd-tier1-diagnostic.mjs

4.2 Hugo build clean

cd brainbar; hugo --quiet
echo $LASTEXITCODE  # must be 0

4.3 Re-run the UX probe

After all CSS bumps, re-run node ux-probe.mjs and confirm the findings count dropped. Findings remaining should ALL be false positives (alpha-blind, hover-state, inheritance) — visually confirm each.

4.4 Live MCP smoke test (if MCP changed)

$body = '{"question":"the question that previously returned wrong content"}'
$headers = @{"Content-Type"="application/json";"Origin"="https://cmd.aguidetocloud.com"}
Invoke-RestMethod -Method Post -Uri https://mcp.aguidetocloud.com/ask -Headers $headers -Body $body
Confirm the answer matches what the corrected data + corrected prompt should produce.


Phase 5 — Deploy

5.1 Parallel-safe git commit (explicit paths)

cd C:\ssClawy\aguidetocloud-revamp
git add brainbar/data/cmd_entries.toml brainbar/data/cmd_decode.toml ...   # explicit
git commit -m "cmd: SME content + code + UX overhaul" -m "<details>" -m "Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>"
git pull --rebase
git push
Never use git add ., git add -A, or git commit -a — pipeline bots may have unrelated _status_/data changes pending.

5.2 Deploy MCP (if changed)

Wrangler is broken on Windows ARM64. Use the env-token REST API path:

cd brainbar-mcp
npx esbuild src/index.ts --bundle --format=esm --target=es2022 --outfile=dist/index.js --platform=neutral
$env:CLOUDFLARE_API_TOKEN = (Get-Content "$env:USERPROFILE\.copilot\secrets\cloudflare-api-token" -Raw).Trim()
$env:CLOUDFLARE_ACCOUNT_ID = "d42846fe2c29daf890ec57877fda5e04"
node scripts/deploy-via-api.mjs
The deploy script reads from env first, argv only as fallback. The "Assertion failed" line that follows the success: true is a Node bug on Win ARM64 — harmless after a successful deploy.

5.3 CI green check

gh run list --workflow cmd-tier1-check.yml --limit 1
Or via the GitHub MCP: actions_list with cmd-tier1-check.yml. Conclusion must be success.


Phase 6 — Document

6.1 Update the session journal

In ~/.copilot/session-journal.md, add a top-of-file entry with: - Commit hash - Bullet list of fixes by category - The 5-test results table (proves it shipped clean) - "Pick up here" pointer for tomorrow-Sush

6.2 Update the reference docs

  • brain-bar-lessons.md — append a new dated section with the patterns that emerged (content-as-code, alpha-blind probe pitfall, regex-order rule, etc.)
  • This file (cmd-content-review-playbook.md) — patch with anything you'd do differently next round

6.3 Update the SQL findings table

UPDATE review_findings SET status = 'done' WHERE id IN (...);
UPDATE review_findings SET status = 'deferred' WHERE id IN (...);
SELECT status, count(*) FROM review_findings GROUP BY status;
Deferred items are the playbook for the next review pass. Document them in the journal so they're not lost.


Patterns that fall out of this playbook

These are the recurring lessons across review passes. They generalise beyond cmd.

P1. Validators check structure, SMEs check truth

A validator can prove every includes[] slug exists in cmd_entries.toml. It cannot prove that m365-e3.includes is a true claim about Microsoft's product. Both passes are required, neither is a substitute for the other.

P2. Prompt examples ARE content

Few-shot examples in an LLM system prompt teach the model what answers look like. If the example reproduces the wrong claim, the model will repeat it even after the underlying data is fixed. Audit prompt examples on every content pass.

P3. Data-removal needs a visual after-test

When you remove an includes[] edge, the validator says "62 → 61 edges, still clean". But tree m365-f3 now renders differently. A user comparing today's output to yesterday's screenshot will spot it. Re-run the UX probe after data deletions, not just data additions.

P4. Alpha-blind contrast checkers lie 30% of the time

If your contrast probe doesn't composite alpha against parent backgrounds, ~30% of its findings will be false positives on cards/pills/badges with translucent backgrounds. Treat the probe output as a "needs visual review" pile, not a "needs fixing" pile.

P5. Renderers and schema slicers are two places to break the same thing

fix was an array in source data, rendered as comma-joined string by the renderer, AND stripped from the JSON schema slice. Two bugs masquerading as one symptom. When a renderer shows , check both the renderer AND the data shape upstream.

P6. Generic-before-specific regex always wins on the wrong route

Any regex dispatch table with broad catch-alls (/me, /users, /security/) MUST be authored most-specific first. Otherwise the generic case hides the specific case forever. Add a comment at the top of the table explaining the ordering rule.

P7. The deploy ladder beats the deploy moment

Deploy success isn't the goal — green CI 10 minutes later is. The 5-test gate (syntax / unit / validator / MCP / Playwright) is what makes the next deploy fast and confident. Run the whole ladder every wave; don't skip waves.


Companion files

  • brain-bar-lessons.md — the long-form lessons doc (philosophy, decisions, next steps)
  • brainbar/ux-probe.mjs — the Playwright UX probe script (canonical)
  • brainbar/qa/cmd-tier1-diagnostic.mjs — the 60-assertion Playwright test suite
  • brainbar-mcp/scripts/deploy-via-api.mjs — REST-API deploy (wrangler-free)
  • brainbar/scripts/validate-cmd-tree.mjs — the structural validator

Captured 8 May 2026 after the overnight cmd overhaul. 27 fixes shipped, 0 regressions, 136 tests still green. If you're reading this in a future session, follow the phases in order, trust the SQL table, distrust the probe's contrast findings until you eyeball them, and keep writing here.