Claw v0b — Follow-ups Prompt for Next Session¶
Paste-ready kickoff for the session that picks up the Claw multi-vendor expansion after Batch 0 shipped (14 May 2026). All decisions are locked; this doc is the resume pointer.
What was shipped on 14 May 2026 (Batch 0 + Batch 0b)¶
Batch 0 (commit 7aa01f6)¶
✅ Live in production at https://claw.aguidetocloud.com since 14 May 2026 ~16:30 NZST.
Batch 0b — URL migration (commits 82853fd + ba65183)¶
✅ Live in production since 14 May 2026 ~16:55 NZST. All 42 OpenClaw content pages moved to /openclaw/* namespace with 301 redirects from the old root paths. Live scripts/smoke-check.mjs passes all 45 checks (30 expect200 + 15 expect301).
What moved (old → new):
- /overview/... → /openclaw/overview/... (4 entry pages + index)
- /setup/... → /openclaw/setup/... (6 entries + index + nested /setup/compare/)
- /connections/... → /openclaw/connections/... (5 entries + index)
- /plugins/... → /openclaw/plugins/... (1 entry + index)
- /use-cases/... → /openclaw/use-cases/... (5 entries + index)
- /security/... → /openclaw/security/... (4 entries + index)
- /resources/ → /openclaw/resources/
- /faq/ → /openclaw/faq/
What stayed at root (vendor-agnostic per IA): /compare/, /updates/, the 5 vendor hubs /openclaw/ /anthropic/ /openai/ /google/ /microsoft/, foundation pages /about/ /methodology/ /colophon/ /404/, the home /.
Claw planet reframed from "OpenClaw study reference" to "a practitioner's field notebook for AI tools you can run, wire, script, compare, or operationally reason about." Same domain, same [*] logo, same #FF2626 red. Now spans 5 vendors: OpenClaw · Anthropic · OpenAI · Google · Microsoft (+ Compare + Updates tabs).
Commits:
- claw-planet 7aa01f6 — 69 files (1,276 insertions · 284 deletions)
- cosmos-atlas 0f64e0e — atlas.json Claw description
- learning-docs cc6faef — playbook v0b admonition section
What's now live:
- New vendor tabs sub-nav below the header (OpenClaw · Anthropic · OpenAI · Google · Microsoft · Compare · Updates)
- 5 vendor hub pages at /openclaw/ · /anthropic/ · /openai/ · /google/ · /microsoft/. Each lists products with status (live / coming-in-batch / phase-2)
- Reframed home + about + methodology pages
- Upgraded verification contract: 5 states (planned · sourced · tried · verified · disputed) + optional structured verificationContext object
- Extended voice-lint: OUT-list patterns + Microsoft public-source-citation gate + <!-- voice-lint:ignore --> annotation pattern
- All 26 existing OpenClaw .mdx content files tagged with vendor: "openclaw" + product: "openclaw-runtime"
- All 22 consumer code files updated to use the new state vocab
What's queued for the next session¶
Option A — Batch 0b: URL migration ~~(recommended first)~~ ✅ DONE 14 May 2026¶
Shipped in commits 82853fd + ba65183. See playbook v0b admonition for the deploy-bug post-mortem (the _redirects file was being uploaded as a static asset instead of as Pages deployment metadata; scripts/deploy.mjs was patched to extract Pages control files and attach them as form fields, matching what Wrangler does automatically).
Option B — Batch A: Anthropic content (~22 pages) ~~← NEXT~~ ✅ DONE 14 May 2026 PM4¶
Shipped in commit a2029df (39 files · +5,066/-23). 21 sourced entries + refined hub across 5 products (Claude Code · Claude API · MCP catalogue · Claude.ai · Computer Use). Sidebar + MobileDrawer now vendor-aware. Defensive vendor filter applied across all OpenClaw routes. audit-claims.mjs patched to count by frontmatter vendor field. VendorHub component upgraded to support inline HTML in blurb. All four CI gates green; 30/30 base smoke checks + 12/12 sample Anthropic URLs 200 live.
Lessons captured:
- Astro 5 evaluates getStaticPaths in a separate scope — can't reach module-level consts. Inline keys into the function.
- Sidebar + MobileDrawer must stay in sync (both render entries; both must filter by vendor). MobileDrawer was the source of 113 broken links until fixed.
- Filename convention <vendor>-<product>-<slug>.mdx + route strips prefix → clean short URLs without schema changes.
- sectionNumber convention CC.1 / API.1 / MCP.1 / WEB.1 / CU.1 for Anthropic products. Will extend to OAI / GG / MS prefixes for Batches B/C/D.
Option C — Batch B: OpenAI content (~20 pages) ~~← NEXT~~ ✅ DONE 15 May 2026 AM¶
Shipped in commit 7562dae (24 files · +3,707/-29). 19 sourced entries + refined hub + routes across 5 products (Codex CLI · OpenAI Agents SDK · OpenAI Apps SDK · ChatGPT Atlas · Custom GPTs). The vendor-aware Sidebar + MobileDrawer helper was refactored from loadAnthropicProductChildren to a generic loadVendorProductChildren(vendor, productKey, productHref) — directly reusable for Batches C/D.
Mandatory 5-parallel-agents SME + voice-duck pass surfaced 8 must-fix factual errors (including 2 runtime-ImportError bugs in the Agents SDK code examples, the wrong default model for /review, wrong Atlas memory naming, and a free-tier access error on Custom GPTs). All corrections applied in the same commit. 30/30 live smoke + 25/25 new OpenAI URLs HEAD-200 + 7/7 SME-fix spot-checks rendering correctly on production.
Lessons captured (apply to Batches C/D):
- The 5-agent SME + duck pattern remains the right cost. Batch B's findings were less catastrophic than Batch A's (~50 errors), but still included 2 runtime ImportErrors that would have broken every reader's first attempt at the Sessions code.
- Upfront source-grounding via web_fetch on canonical docs (done in Batch B for Apps SDK, Atlas, Agents SDK) reduced the post-SME finding count significantly vs Batch A. Worth doing both — pre-write grounding + post-write SME.
- The same Codex CLI background research agent fed the entire 8-page Codex section. Cleaner than fetching piecemeal.
Option D — Batch C: Google content (~21 pages) ~~← NEXT~~ ✅ DONE 15 May 2026 PM¶
Shipped in commits af86c47 (drop) + ea327ac (post-deploy second-pass SME). 21 sourced entries + refined hub + routes across 6 products (Gemini CLI · Gemini API · NotebookLM · Vertex AI Agents · AI Studio · Gems). The vendor-aware Sidebar + MobileDrawer just reused the generic loadVendorProductChildren helper from Batch B with no refactor.
Mandatory two-pass SME found:
- Pass 1 (5 parallel agents, same commit
af86c47): ~25 P0+P1+P2 fixes including 2 P0s —HARM_CATEGORY_DANGEROUS→HARM_CATEGORY_DANGEROUS_CONTENT(PythonAttributeError); NLM's "80+ visual styles" was wrong (that count is languages — only ~10 actual styles). Plus all three Gemini 2.5 prices were stale, several unverifiable Codex CLI competitor claims, generic MCP needsMcpToolsetnotToolboxToolset. - Pass 2 (same 5-agent pattern, separate commit
ea327ac): ~30 more fixes including 3 P0s pass 1 missed: VAI.3'svertexai.agent_engines.create(agent=app, ...)wouldTypeError(old API usesagent_engine=, new API usesagent=— easy to confuse); NLM Cinematic was framed as "Ultra-only" but Google's own tier limits page gives Pro 2/day; "agentic capabilities" forbidden phrase still survived as a verbatim Google docs quote. Plus Trusted Folders is enabled by default (we said disabled), sandbox is 6 backends not 5 (Windows Native missing), Gemini 2.5 search grounding is $35/1K not $14/1K (only Gemini 3 is $14).
All four CI gates green; 30/30 base smoke + 18/18 new Google URLs + 9/9 critical fix spot-checks all 200 on production.
Lessons captured (apply to Batches D/E):
- Pre-grounding (web_fetch on canonical docs into agent context BEFORE drafting) continues to pay off.
- Pass-2 source-code-vs-docs reading catches things pass 1 (which mostly reads docs) misses — Trusted Folders default state was discoverable only by reading the ?? true null-coalescing in the source.
- Two-pass SME discipline remains non-optional. Pass-1 is fast and high-recall on facts; pass 2 is slower and catches runtime/contradictions/cross-page issues.
- Architecture is now reusable across 4 vendors with zero infrastructure refactor needed for Batch C.
Option E — Batch D: Microsoft content (~8 pages) ~~← NEXT~~ ✅ DONE 15 May 2026 PM¶
Shipped in commits cc56cf6 (drop) + 81b3f2a (post-deploy second-pass SME). 8 sourced entries + refined hub + routes across 6 Microsoft products (M365 Agents Toolkit · Copilot Studio · Azure AI Foundry / Microsoft Foundry · GitHub Copilot · Declarative Agents · MCP for Microsoft). Architecture reused the generic loadVendorProductChildren helper from Batch B with no refactor — microsoft elif added to Sidebar + MobileDrawer; google/[product]/{index,[slug]}.astro sed-replaced to microsoft/; voice-lint MS_PUBLIC_DOMAINS extended to include docs.github.com/copilot, github.com/features/copilot, github.blog (GitHub is Microsoft).
Mandatory two-pass SME found:
- Pass 1 (5 parallel agents, same commit
cc56cf6): ~20 P0+P1+P2 fixes including P0 zero-rating-covers-ALL-meters in CPS.1 (pass-1 had said just 3), ATK license MIT not Apache-2.0, Teams plan "Generative AI authoring: Limited"→"Not available", Foundry DataZoneBatch SKU missing from sub-types table, AGENTS.md nearest-precedence claim removed (pass 2 later reverted this — see below),foundry-agent-to-m365template 6.6.0 not 6.4.0, MCP zero-rating reframed. - Pass 2 (same 5-agent pattern, separate commit
81b3f2a): ~15 more fixes including 3 critical pass-1-missed items: a REGRESSION pass 1 itself introduced on ATK.1 licensing (pass-1 SME's prereqs-table source was wrong; DA manifest v1.7 reference Note is canonical and says "Web Search is the only capability available without Copilot license") — live-fetched both pages to confirm; EmbeddedKnowledge actually shipped GA in ATK 6.6.0 March 2026 (page said "not yet available"); AGENTS.md / CLAUDE.md / GEMINI.md is NOT "all three combined" (docs say "alternatively" — substitutes, not merged — AND nearest-AGENTS.md-precedence exists). Plus Foundry brand flipped to "Microsoft Foundry" (current Learn primary brand); Foundry tool catalogue restructured into built-in vs custom (MCP + OpenAPI + A2A + Toolbox are custom tools, not built-in preview);azure-ai-agentsis NOT preview (has GA 1.1.0); Custom Code Interpreter (preview) added; azure-ai-projects 2.1.0 (April 2026) noted; "Work IQ" restored as canonical public Copilot Studio UI label (pass 1 wrongly removed it; public Microsoft brand names allowed); Cortana added to Azure Bot Service channels list.
All four CI gates green; 30/30 base smoke + 8/8 new Microsoft URLs + 12/12 pass-1 + 16/16 pass-2 critical-fix spot-checks all 200 on production.
Lessons captured (apply to Batch E): - Pass-1 SME findings need their own SECOND-source check before acceptance. The ATK.1 regression came from trusting pass-1 SME on the prereqs page without cross-checking the v1.7 manifest reference. Pass 2 caught it; pass 2 lives partly to catch pass-1-induced regressions, not just pass-1 oversights. - Source-code-vs-docs reading caught the EmbeddedKnowledge GA — ATK 6.6.0 CHANGELOG had it; the Learn docs had moved on; pass 1 had read an older Microsoft Learn page. Read the repo CHANGELOG, not just Learn. - Brand rebrand windows are a perpetual pass-2 target — "Azure AI Foundry" vs "Microsoft Foundry" needed adjudication via the current Learn page. Same lesson Batch C had with NotebookLM tier gating. - Tool catalogues are EASILY misread structurally. Foundry's tool-catalog page has explicit "Built-in" vs "Custom" subsections; pass 1 collapsed them and put MCP + OpenAPI in the wrong bucket. Look at section headings, not just bullet lists. - Voice-duck pass 1 misread the "Work IQ" instruction as a guardrail breach when the public Copilot Studio feature uses that exact UI label. Public Microsoft brand names should be retained; only internal-Microsoft context (WorkIQ as Sush's internal tool, customer engagement detail) is forbidden. Restored the name in pass 2. - Architecture reuse continues to scale — Sidebar/MobileDrawer/route shape worked for the fifth vendor (Microsoft) with zero infrastructure refactor.
Option F — Batch E: Compare + Updates seed ~~← NEXT~~ ✅ DONE 15 May 2026 PM2¶
Shipped in commits ebe34cd (drop + pass-1 SME) + d43297e (pass-2 SME). Phase 1 of v0b is now CLOSED.
What landed:
- 3 new cross-vendor /compare/ pages: cli-coding-agents (4-way: Claude Code · Codex CLI · Gemini CLI · GitHub Copilot CLI), direct-model-apis (3-way: Claude API · Gemini API · Microsoft Foundry — OpenAI direct excluded with honest framing), m365-extensibility-paths (3-way: Copilot Studio · Declarative Agents · Custom Engine Agents — page explicitly acknowledges Microsoft's canonical 2-way taxonomy).
- 5 new /updates/ entries balanced across vendors per duck rule (1 each Anthropic/OpenAI/Google + 1 Microsoft + 1 Claw-internal).
- Refreshed /compare/ index to atrium-style cards with decision-angle labels (WHICH AGENT SHAPE? · WHICH CLI? · WHICH M365 PATH? · WHICH API / PLATFORM?).
- New data-driven architecture: src/data/toolRegistry.ts (canonical labels + hrefs) + src/data/comparisons.ts (TypeScript not TOML — went with TS for type-safety + structured cells + sourceRefs; starter prompt explicitly permitted both). Build-time validateComparisons() enforces 7 invariants. Two new components: ComparisonMatrix.astro + CompareCards.astro.
- Schema: added 'cross-vendor' to vendor enum. Sidebar + MobileDrawer compare loaders accept optional null vendor (compares are universal). getRelated URL builder patched to be vendor-aware. seeAlso template patched to route §7.x refs to specific compare slugs (was collapsing to #sec-7).
Two-pass SME (6 parallel agents each pass — duck-revised composition):
- Pass 1: 4 SME (sme-compare-cli, sme-compare-apis, sme-compare-m365, sme-updates) + matrix-data-auditor + cross-page-voice-duck. ~20 P0+P1 fixes incl. 2 P0s (Claude Code 2.1.142 features fabricated from prior-session memory; Sora-2 wrongly classified as video INPUT modality when it's video generation/output) + 1 P0 cross-page fix (Foundry "wraps Azure OpenAI" → "supersedes/absorbs"). GHC sign-up pause entry removed (canonical blog post URL unverifiable). Applied in commit ebe34cd.
- Pass 2: same 6-agent pattern. Caught 1 P0 REGRESSION pass 1 introduced ("Video in (GPT-5 family)" — GPT-5 series is text+image only per Microsoft's models page; pass 1 correctly removed Sora-2 but wrong-filled the slot) + 6 P1s (Gemini default model wrong "Gemini 3 family" → gemini-2.5-pro stable; Claude Code Opus 4.6/4.7 stale post-2.1.142; Codex models missing gpt-5.4/4-mini and codex-spark should be gpt-5.3-codex-spark; seeAlso template bug all refs → #sec-7; Gemini Flash-Lite cache $0.025 → $0.01; AppSource CEA Copilot Studio caveat) + 4 P2s. Applied in commit d43297e. Also REVERTED 2 Pass-1 fixes that were wrong: DA license "instruction-only" exception (Pass 1 narrowed too far based on manifest doc; cost-considerations doc says BOTH web-search-only AND instruction-only exempt); Gemini CLI free tier "AI Studio key = 250/day Flash-only" (README says both paths = 1,000/day).
All 4 CI gates green both passes; 30/30 base smoke + 5/5 new compare URLs + 11/11 Pass-2 critical-fix spot-checks all 200 on production.
Lessons captured (apply to Phase 1.1+):
- Two-pass SME discipline confirmed for the 5th batch in a row. Every batch's pass 2 catches either a regression pass 1 introduced or a P0 pass 1 missed. The matrix-data-auditor + cross-page-voice-duck agents (added per pre-implementation duck rec) materially upgraded coverage — they caught the seeAlso template bug, the broken getRelated URL builder, the Gemini default model contradiction, and the Foundry/AppSource cross-page issues.
- "Verify SME findings against second source" rule must extend to Pass-1-applied fixes too — not just original drafts. Pass 2 caught two Pass-1 corrections that were themselves wrong (DA license, Gemini free tier). The rule prevents pass-1-induced regressions but Pass 2 still has to apply the same scepticism to Pass 1's own corrections.
- Data-file-first was the right architecture call. Type-safe comparisons.ts with build-time validator caught 4 issues during initial fill that would have shipped as silent blank cells. Phase 1.1 maintenance edits cells with type-safe round-trips; no MDX prose re-edits needed.
- Source-code-vs-docs reading caught more than docs alone (e.g. Gemini CLI stable-channel default in packages/core/src/config/models.ts contradicted the marketing). Worth adding "fetch the actual source code" as a standard SME prompt for any version-default claims.
- The starter-prompt allowance to deviate from TOML to TS for the data file was correct. TS gave us type safety + zero parse cost + consistency with existing updates.ts. The "data-first" rule's spirit (data in a dedicated file, not crammed into MDX) was preserved.
Phase 1.1 — Ongoing rolling content + Phase 1 cleanup¶
Session 1 (Track A) — ✅ DONE 15 May 2026 PM3¶
Shipped in commits cada02d (drop + pass-1 SME — vendor-page consistency cleanup, 5 starter items + 3 spot-check P1 additions) + b1710d4 (pass-2 SME — 4 P1 + 1 P2 cross-page mismatches caught post-deploy).
Session 2 (Tracks A + B + C) — ✅ DONE 15 May 2026 PM5¶
Shipped in commits e5fa064 (drop + pass-1 SME — adjacent Claude context-window + lastUpdated sweep) + d687515 (pass-2 SME — 1 P1 leftover instance on sibling product page).
Edits across 9 files (4 .mdx Claude context fixes + 2 gemini-cli/overview table cells + 1 .ts data cell + 1 .mdx use-cases prose + 2 Astro components):
- Track A canonical: pitfalls/api-models/memory/computer-use updated to per-model context (Sonnet 4.6 + Opus 4.7 = 1M, Haiku 4.5 = 200K); memory.mdx GPT-5 reference now 400K context with 272K input cap per Microsoft Foundry.
- Track A SME-caught adjacent cells: gemini-cli/overview table Codex CLI cell not publicly stated→gpt-5.5 + ~200K→~1M (gpt-5.5; 922K input); Copilot CLI cell 128K→1M (model-limited).
- Track A Pass-2-caught leftover: use-cases/google-gemini-cli-use-cases.mdx:25 sibling-page "Copilot CLI at 128K" relic dropped entirely (clause framing was internally inconsistent even pre-Pass-1).
- Voice-duck-caught: comparisons.ts:687 alt flagship→alt high-end (voice-lint blind spot for .ts data files).
- Track B (Copilot CLI README verify): NO EDIT NEEDED — README still literally says "By default, copilot utilizes Claude Sonnet 4.5". Site's 4 "(per README)" places correctly aligned. Documented.
- Track C: pages/openclaw/setup/compare/index.astro derives lastUpdated={setups.reduce(max lastReviewedAt)} (live: 2026-05-07); components/Header.astro fallback dropped '2026-05-08'→'—' (Sush's explicit pick).
Lessons:
- 7-batch two-pass-SME streak intact. Pass-2 caught a P1 leftover instance in a SIBLING file (use-cases vs overview, same product, same defect class). Adjacent-PAGE coupling is now a high-signal pattern alongside Session 1's adjacent-row pattern.
- Voice-lint blind spot for .ts data files — Pass-1 voice-duck caught "flagship" in comparisons.ts. Voice-lint should be extended to scan src/data/*.ts. Session 3 candidate.
- Sibling-file grep rule: when fixing a model-spec on file X (e.g. <vendor>/<product>/overview.mdx), also grep the SIBLINGS (use-cases, pitfalls, getting-started for the SAME product) — defects propagate across product-sibling pages, not just across vendors.
Session 3 (Tracks A + B + C) — ✅ DONE 15 May 2026 PM7¶
Shipped in commits 8adac63 (drop + pass-1 SME — model-lineup freshness across connections/models.mdx + frontmatter note on anthropic-claude-api-models.mdx + sibling fix on google-gemini-cli-overview.mdx) + dea069b (pass-2 SME — Sonnet 4.5 1M-context framing tightened with 4 independent sources + 3 sibling-page claude-3-5-sonnet format-example leftovers + Pass-1-induced "docs default" misattribution + Gemini 2.5 Flash shutdown date sync).
Edits across 6 files:
- Track A page-wide rewrite of connections/models.mdx — legacy claude-3-5-sonnet / gpt-4o* / gemini-2-flash* replaced with current claude-sonnet-4-6, claude-opus-4-7, claude-haiku-4-5, gpt-5.5, gpt-5-mini, gemini-3.1-pro-preview, gemini-3.1-flash-lite, gemini-2.5-pro, gemini-2.5-flash. Pricing table updated with verified Anthropic + Google numbers (Opus 4.7 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5, Gemini 2.5 Pro $1.25/$10, Gemini 3.1 Pro Preview $2/$12, Gemini 3 Flash Preview $0.50/$3, Gemini 3.1 Flash-Lite $0.25/$1.50, Gemini 2.5 Flash $0.30/$2.50). OpenAI prices honestly deferred to vendor pricing page (JS-rendered). Added Sush "boring answer first" framing. Failover JSON example + stress-test guidance updated.
- Track B verify-only on microsoft-github-copilot-overview.mdx lines 72 + 104 — docs.github.com/en/copilot/concepts/billing/copilot-requests confirms verbatim "GPT-5 mini, GPT-4.1 and GPT-4o are the included models, and do not consume any premium requests if you are on a paid plan." NO EDIT.
- Track C anthropic-claude-api-models.mdx: Sonnet 4.5 row updated from "1M" → "200K (1M beta: Bedrock/Vertex only)"; explanatory prose specifies the cloud-platform-only scope; verificationNote replaced with enriched version citing 4 independent sources (Anthropic context-windows doc, Foundry doc, Vertex AI model card, Bedrock marketing).
- Pass-2 sibling-page fixes: google-gemini-cli-overview.mdx:66 (Copilot CLI / Sonnet 4.5 context cell), explainers/concepts.mdx:191 (Model ref glossary format example), explainers/architecture.mdx:52 (architecture walkthrough format example), use-cases/rag-personal-docs.mdx:149 (config code-block comment) — all updated from claude-3-5-sonnet → claude-sonnet-4-6.
Pass-1 SME findings (3 parallel research agents + voice-duck):
- Track-A SME caught 2 P0s (Gemini 3.1 Pro Preview pricing was 4× wrong; Gemini 3 Flash tier mismatched with Flash-Lite pricing) + 4 P1s + 3 P2s — all applied in same commit.
- Track-C SME caught my Pass-1 verificationNote misreading evidence — the Anthropic Bedrock "endpoint types" note is about routing, not context-window size. The GitHub Copilot CLI README mentions Sonnet 4.5 as default but says NOTHING about 1M context. Pass-1 verificationNote was fabricated attribution. Replaced.
- Track-B SME confirmed no-edit decision (6/6 specific claims verbatim-verified against canonical GitHub docs) + flagged 2 watch items for next session (GPT-4o soft-deprecation; June 1 2026 billing model transition).
- Voice-duck caught the adjacent-page Gemini naming + pricing mismatch (my models.mdx had gemini-3-pro-preview + Flash pricing on the Pro row; canonical google-gemini-api-models.mdx has gemini-3.1-pro-preview + $2/$12). Same Session 2 lesson again (adjacent-page coupling) — fixed in Pass 1.
Pass-2 SME findings (3 parallel research agents + voice-duck):
- Pass-2 Track A SME independently verified all 3 Anthropic prices via Anthropic models all-models page + all 5 Gemini prices via Vertex AI pricing (a separate domain/product surface from the Pass-1 cited ai.google.dev/pricing). All matched. Also caught 3 sibling-page claude-3-5-sonnet format-example leftovers in explainers/concepts.mdx, explainers/architecture.mdx, and use-cases/rag-personal-docs.mdx — these were format examples, not recommendations, but readers copy them. Classic Pass-1 miss + sibling-page coupling pattern. Also caught gemini-2.5-flash missing shutdown date in models.mdx:90 even though its sibling gemini-2.5-pro two lines above has the date.
- Pass-2 Track C SME independently verified Sonnet 4.5 context state via Vertex AI model card (explicit "1M (Preview) / 200,000 (GA)" tier labelling) + Anthropic's own context-windows doc (200K only on the direct API, no 1M mention) + Foundry doc (200K only) + Bedrock-legacy technical docs (200K). The 1M is cloud-platform-specific — NOT available on direct API or Foundry. Pass-1 "preview" framing implied universal availability; Pass-2 tightened to "1M beta: Bedrock/Vertex only" in 3 places plus sibling fix.
- Pass-2 voice-duck caught 2 Pass-1-induced inaccuracies: (a) line 84's "docs default for new Gemini 3 code" was wrongly attributed to Pro Preview — the actual Gemini 3 docs default for code examples is gemini-3-flash-preview per the Google explainer page; dropped the misleading clause; (b) line 77 "set a cheap fallback" tension with the GPT-5.5 / Gemini Pro examples used below — changed to "explicit fallback".
All 5 CI gates green both passes; live spot-checks 6/6 Pass-1 + 10/10 Pass-2 all PASS on production.
Lessons captured (apply to Session 4+): - 8-batch two-pass-SME streak intact. Pass-2 caught: a Pass-1 misreading of evidence (the Sonnet 4.5 verificationNote based on a Bedrock-endpoint-types note that's actually about routing); 3 sibling-page format-example leftovers in core explainer pages I didn't initially scope; a missing shutdown date that made one Gemini cell inconsistent with its sibling. - Sibling-page coupling has at least three flavours now: adjacent-row (Session 1), adjacent-page same-product (Session 2 — use-cases vs overview), and adjacent-FILE same-concept (Session 3 — format examples scattered across concepts.mdx, architecture.mdx, use-cases code blocks). The grep rule: when fixing a model-name on file X, also grep for the OLD name across the whole content tree, not just the same product family. - Pass-1's own corrections need Pass-2 second-source verification — the Pass-1 verificationNote I wrote on Track C was itself wrong (transitive reading of an unrelated Bedrock note). The pattern: SME agent's research can be confused by parallel similar-looking documentation; Pass-2's job includes verifying Pass-1's source attribution, not just Pass-1's content corrections. - OpenAI direct-API pricing is structurally unverifiable for Claw — the platform.openai.com pricing page is JS-rendered + 403s on most fetch tools. Honest framing ("not quoted here — check vendor page") is the right move; don't invent numbers; flag the structural limitation. - Cloud-platform-specific feature availability matters more as the vendor landscape consolidates. Anthropic models behave differently on the direct API vs Bedrock vs Vertex AI vs Foundry — Claw pages that name a feature should say which platforms it's on, not just "preview" or "beta" without scope.
Session 4 (Tracks D.2 + D.1 + P2 polish) — ✅ DONE 15 May 2026 PM9¶
Shipped in single commit 16e3444 (combined Pass-1 drop + 6 Pass-1 SME fixes + 3 Pass-2 SME corrections; deployed Cloudflare Pages 16:23 NZST). 7 files · 467 insertions · 13 deletions · 141 pages built · 0 broken links.
Edits across 7 files:
- Track D.2: scripts/voice-lint.mjs — refactored SCAN_DIRS to {dir, exts} shape and added {src/data, [.ts]} so the 3 canonical data files (comparisons.ts, updates.ts, toolRegistry.ts) now get the FORBIDDEN-word + OUT-of-scope-phrase guardrails. Smoke test: injected // flagship test into comparisons.ts and confirmed strict-mode caught it at line:column with exit 1.
- Track D.1: new scripts/audit-blurbs.mjs (~290 LOC) — vendor-hub + product-hub + canonical-data freshness gate. Dual-list classifier (LEGACY_MODELS denylist + KNOWN_CURRENT_MODELS allowlist + SKIP_PRODUCT_SLUGS skip set), greedy hyphenated-tail regex (catches gpt-4o-audio-preview / gpt-5.3-codex-spark as single tokens), suffix-stripping fallback in classify() (unknown suffixed variants fall back to base family). Scope: 5 vendor hubs + 4 product hub [product]/index.astro ledes + 3 src/data/*.ts files = 12 files. Modes --warn (default) / --strict (exit 1 on LEGACY hit; INFO never blocks). audit-blurbs:ignore line-annotation escape hatch. lastVerified: 2026-05-15 constant with 90-day staleness self-warning. npm scripts audit:blurbs + audit:blurbs:strict. NOT yet wired into .github/workflows/integrity.yml per duck finding 8 (advisory first; promote to CI gate after 1-2 clean sessions).
- P2.1 setups/raspberry-pi.mdx:151: "Claude Sonnet or GPT-4o" → "hosted cloud models (Claude Sonnet, GPT-5.5, Gemini 2.5 Pro)".
- P2.2 connections/models.mdx:83: "OpenAI's heavy-reasoning tier" → "one of OpenAI's premium reasoning tiers (alongside GPT-5.2 and GPT-5.4)".
- P2.3 deferred — duck independently fetched Anthropic context-windows doc; the context-1m-* beta-header name is NOT surfaced on public Anthropic docs. Current "1M beta: Bedrock/Vertex only" wording is as tight as public sources allow. Reassigned to watch item.
- Pass-1 SME content corrections (5 P1 + 1 P2): reverted comparisons.ts:692 Gemini CLI preview-channel cell to gemini-3-pro-preview (the CLI's actual default — my 3.1 fix was wrong-direction); fixed openai/[product]/index.astro:27 Codex CLI lede from (gpt-5-codex by default) to (gpt-5.5 by default; gpt-5.4 fallback) per developers.openai.com/codex/models; refactored audit-blurbs regex + classify suffix-stripping; moved gpt-4o/gpt-4o-mini out of LEGACY (still in current OpenAI SDK); removed gpt-5/gpt-4 from SKIP_PRODUCT_SLUGS (they're real model IDs per SDK).
- Pass-2 SME corrections (3 corrections): moved gpt-4 from CURRENT to LEGACY (scheduled shutdown 2026-10-23 per OpenAI deprecations); raspberry-pi.mdx GPT-5.2 → GPT-5.5 anchor (5.2 is "previous frontier" per Codex docs); removed now-unneeded audit-blurbs:ignore comment from comparisons.ts:264 (suffix-stripping makes gpt-4o-audio-preview correctly silent without inline annotation).
Rubber-duck dispatched BEFORE implementing (8 findings):
- Adopted: dual matcher regex shape (API-ID + human-prose), LEGACY as primary signal with CURRENT for noise suppression only, expanded audit-blurbs scope to product-hub [product]/index.astro files, P2.3 deferred (not edited), pre-Pass-1 explicit grep step, audit-blurbs NOT promoted to strict CI on first commit, CI gate order corrected (build BEFORE integrity-check; audit-claims/integrity-check depend on dist/).
- Adopted with modification: voice-lint .ts scan kept simple (file-buffer scan + voice-lint:ignore escape hatch); AST-based string-position scanning deferred until noise materialises.
Lessons captured (apply to Session 5+):
- 9-batch two-pass-SME streak intact. Pass-2 caught 3 real issues Pass-1 didn't (gpt-4 deprecation 2026-10-23, GPT-5.2 vs 5.5 as current anchor, redundant audit-blurbs:ignore comment). Worth noting: Pass-2 ALSO surfaced a watch item — gpt-4o-audio-preview may be shut down 2026-05-07 with replacement gpt-audio-1.5, but the OpenAI SDK at HEAD still lists the gpt-4o audio family. Logged as a watch item rather than acted on (can't independently verify gpt-audio-1.5 from public sources; deprecations page 403'd).
- Tooling-changes need their own SME pass. Track D.1 (a new audit script) had its own classification logic that needed independent verification — Pass-1 caught 5 P1 design errors in the model lists / regex / classify behaviour. Tooling design isn't validated by "the script ran without crashing"; it needs an independent fact-check on its embedded assumptions about the model landscape.
- Pre-edit grep + rubber-duck plan was the right cost. The duck finding "CI gate order is wrong" (build needs to precede integrity-check) would have caused a deceptive false-pass if I'd run the gates in my originally-planned order. Free fix, ~1 minute of duck time.
- Audit-blurbs is already paying off — it caught a real gemini-3-pro-preview mis-fix in comparisons.ts:692 during the first run. The pattern: when a stale name has a successor, my Pass-1 instinct was to "fix" to the successor; but the canonical CLI docs literally still use the pre-rename name (because of the API redirect). Audit-blurbs's INFO list surfaces these subtle naming-vs-routing questions for human review.
Session 5 (Tracks D.3 + D.4 + D.5) — ✅ DONE 16 May 2026 AM10¶
Shipped in single commit 24a0326 (3,407 insertions including SME templates · 13 deletions). 5 files changed: 1 modified (integrity.yml rewrite) + 4 new (docs/sme-templates/{README,claim-packet,sme-pass-1,sme-pass-2}.md). 0 content files touched. 141 pages built; 10,554 internal links scanned; 0 broken. 5 CI gates green locally; GHA run on Ubuntu Node 20 green on first push — confirms the new strict gates work portably.
Edits across 5 files:
- Track D.3 (SME prompt templates as Markdown, NOT a Node script): 4 new files in docs/sme-templates/. README.md (index + when-to-use + design-boundary section noting tooling-SME is a separate pass). claim-packet.md (pre-Pass-1 planning template — files-touched, sibling-page-grep with portable rg -g "*.{md,mdx,ts}" form, claims, Sources A and B locked in advance, voice-duck targets, done conditions, rollback plan). sme-pass-1.md (Pass-1 agent prompt — canonical-source verify + 3 sweeps: sibling-page-coupling grep, adjacent-cell scan on tabular data, cloud-platform-specific feature scope check; anti-patterns including wrong-direction fix from Session 4). sme-pass-2.md (Pass-2 agent prompt — second-source verify of Pass-1's corrections + 3 sweeps: adjacent-page same-product, adjacent-FILE same-concept, Pass-1-induced regression check with 5 named patterns including SDK-present-not-equals-current).
- Track D.4 (CI gate promotion): .github/workflows/integrity.yml rewritten. New gate order: voice-lint:strict → audit:blurbs:strict → audit:verification → build:no-search → check:links. Cheap fail-fast invariant preserved (cheap gates ~10s combined; expensive build is 100s). Build-before-link-audit invariant preserved (audit-claims.mjs reads dist/ and gracefully silently passes if dist is absent — a deceptive false-pass without the ordering guard). Workflow uses npm run X commands so local + CI commands match. Workflow comment block documents the inline escape hatches (voice-lint:ignore / audit-blurbs:ignore).
- Track D.5 (documentation-only): No code or content changes. Probe over 98 src/content/*.mdx returned 4 LEGACY (all legitimate historical references; 1 false-positive "Marie Curie" matched curie legacy model) + 14 INFO (9 Ollama no-hyphen tags like llama3.2:3b; 3 bare-family-name prose Sonnet 4 family; 2 misc). Zero actionable content errors. Three specific script improvements queued for Session 6 (see Session 6 candidates below).
Rubber-duck dispatched BEFORE implementing (10 findings; all adopted or adopted-with-modification):
- Block 1 (adopted): Drop the Node script for Track D.3 → Markdown templates instead. Node script can't actually invoke task tool; it would just emit prompts. Markdown templates I read at session start serve the same function with zero infrastructure cost. Graduate to a script only when 3+ more sessions confirm placeholders are stable.
- Block 2 (adopted): No YAML manifest. Node has no built-in YAML parser; JSON-or-prompt-table only.
- Block 3 (adopted): Don't add inline audit-blurbs:ignore annotations to MDX yet. The current suppressor only checks the same rendered source line; testing MDX syntax matters; non-rendered suppression sidecar is the right Session 6 design.
- Issue 4 (adopted): Track D.5 scope reduced to documentation-only. Don't add --scope=content flag this session. Don't touch content files.
- Issue 5 (adopted): Use npm run commands in CI workflow, not raw node scripts/… — keeps local + CI matched.
- Issue 6 (adopted): Cheap-gates-first CI step order. Fail fast on guardrail breaks instead of after a 100s build.
- Issue 7 (adopted): Portability risk is low but verified anyway. Both scripts use only Node 20-safe APIs (fs/promises, replaceAll, String.matchAll). Path separators handled defensively via .replaceAll('\\', '/'). Confirmed by GHA run going green on first push.
- Issue 8 (adopted): Minimum-viable SME pattern. Session 5 has no content edits and no classifier changes — Pass-1 sufficient. The 9-batch two-pass-SME streak is a content-session metric; Session 5 is pure infra so single-Pass is the correct cost (don't break the streak by inflating its definition).
- Issue 9 (adopted): Don't add --scope=content flag without suppression mechanism designed first.
- Issue 10 (adopted): Templates address the right pain — sibling-page misses, source misreadings, Pass-1-correction regressions — not just "prompt boilerplate".
Pass-1 SME on the deliverables (1 P1 + 3 P2 corrections; no Pass-2 — infra-only):
- P1: rg --type mdx is NOT a built-in ripgrep type and produces zero hits silently on a fresh install (incl. Ubuntu CI). Defeats the very sibling-page grep the template was created to enforce. Fix: switched to rg -g "*.{md,mdx,ts}" in claim-packet.md + sme-pass-1.md.
- P2: Session 4 "SDK-present ≠ current" lesson (gpt-4 still in SDK but deprecated 2026-10-23) appeared only as narrative example in sme-pass-2.md intro. Fix: added as named 5th Pass-2 regression pattern alongside wrong-direction-fix / misattribution / tense-break / cloud-platform-scope-drift.
- P2: "Tooling-design needs its own SME pass" lesson (Session 4 / Track D.1) was captured in sme-pass-2.md anti-patterns but not in README.md. Fix: added to README's "What these templates do NOT do" section so readers don't mistakenly use these templates for audit-blurbs.mjs / voice-lint.mjs classifier changes.
- P2: Workflow comment grouped OUT-of-scope phrases with blocking gates. They're actually advisory (voice-lint.mjs:216-222 doesn't set hasBlockingFinding). Fix: split the comment to mark OUT-of-scope phrases as "logged but advisory only".
Lessons captured (apply to Session 6+):
- Pass-1-only SME is the right cost for infra-only sessions. The 9-batch two-pass streak applies to content-session work where vendor facts get verified. Tooling sessions either need a single Pass-1 (correctness check) or a dedicated tooling-design SME (which is a different prompt shape — see new README design-boundary note). Don't conflate the metrics.
- The probe-before-decide pattern paid off. Running an exploratory audit-blurbs over src/content/*.mdx BEFORE writing the rubber-duck question gave the duck concrete data (4 LEGACY all-historical, 14 INFO mostly Ollama-shape) which let it make a calibrated "don't expand scope yet" recommendation. Without the probe, the duck would have had to guess at noise profile. Worth doing for every Track-X "should we extend Y" question.
- Rubber-duck pre-implementation saved a wasted hour. My original plan was to ship scripts/sme-validation.mjs (~150-250 LOC). The duck's "this is a Markdown template, not a Node script" observation took ~5 minutes and saved ~60 minutes of writing code that would have been bin'd at the next critique. Cost-benefit on dedicated rubber-duck pre-write step continues to dominate.
- CI promotion has a deceptive false-pass surface. audit-claims.mjs reads dist/ and silently reports 0 broken links if dist/ doesn't exist. The only guard is workflow step ordering. Pass-1 SME caught this nuance in the integrity-check chain (P2 finding); worth a permanent note for any future audit script that depends on built artifacts.
Session 6 (audit-blurbs classifier hardening + content-scope flag) — ✅ DONE 16 May 2026 AM11¶
Shipped in single commit ccb123d (4 files · 198 insertions · 28 deletions). 5 CI gates green locally. GHA Integrity check on Ubuntu Node 20 green on first push (confirms portability of the new sidecar loader + regex branches). Live at https://claw.aguidetocloud.com; smoke 45/45 + 4/4 spot-checks (openai, google, microsoft, openclaw vendor hubs) + GHA run green.
Edits across 4 files:
- Fix 1 (Ollama no-hyphen tags) — scripts/audit-blurbs.mjs KNOWN_CURRENT_MODELS gained llama3.2, llama3.1, qwen2.5 (Ollama's canonical tag form per ollama.com/library — ollama run llama3.2, not llama-3.2). Rejected Option A (extending variants()) per duck #2 — narrower blast radius, easier to audit. NOT adding sonnet-4 as a family marker — would globally suppress future stale Sonnet 4 family refs when 5.x ships; the 3 prose lines get sidecar entries instead.
- Fix 2 (Drop bare short legacy + add canonical text-IDs + Pass-2 P1 bare-suffixed branch) — API_ID_RE lost bare davinci|curie|babbage|bison from the no-prefix branch (eliminates "Marie Curie" / "Charles Babbage" / "Leonardo da Vinci" false-positive class). Gained two new branches: (?:text|code|chat|codechat)-(?:davinci|curie|babbage|ada|bison|cushman)-\d{3} (canonical OpenAI/PaLM-2 IDs) and (Pass-2 P1 fix) (?:davinci|curie|babbage|ada|cushman)-\d{3} (bare-suffixed forms that OpenAI Completions API still exposes — davinci-002, babbage-002 per openai-python/completion_create_params.py at HEAD). The -\d{3} suffix avoids the bare-word false-positive class. 15 explicit LEGACY entries added: text-davinci-001/002/003, text-curie-001, text-babbage-001, text-ada-001, code-davinci-001/002, code-cushman-001, text-bison-001, chat-bison-001, code-bison-001, codechat-bison-001, davinci-002, babbage-002.
- Fix 3 (Sidecar suppression mechanism) — new audit-blurbs.ignore.json at repo root (committed, non-hidden). Schema: {file, line, token, context, reason, added, reviewAfter?}. Context substring is the STRICT guard (line is approximate; robust to line-shifts from unrelated edits — duck #3). Default reviewAfter = added + 365 days (historical citations don't expire — duck #4). Hard-fails on malformed JSON or invalid entries — every field validated (duck #5). Missing file = empty suppressions, no error. 5 initial entries: 3 Sonnet 4/5 family prose lines + 2 historical Gemini-1.5 citations (Google AI Studio tunable-model + NotebookLM June 2024 launch). Inline audit-blurbs:ignore annotations remain for .astro + .ts comment contexts where they don't leak into rendered output.
- --scope=content flag — advisory-only mode that adds src/content/**/*.mdx (98 files) to the scan. Findings populate separate contentLegacyHits + contentInfoHits arrays. Content-scope LEGACY hits NEVER trigger exit-1 in strict mode (verified in Pass-2 sweep 2 — contentLegacyHits never appears in any exit condition). The 12-file default strict gate is unchanged. New npm script audit:blurbs:content. NOT added to CI workflow (developer-facing tool, not a blocking surface).
- Pass-2 P2-TOOLING-01 (workflow comment fix) — .github/workflows/integrity.yml gate-2 comment now documents both escape hatches (inline annotation for .astro/.ts; sidecar for .mdx prose).
- Pass-2 P2-TOOLING-03 (dead-code removal) — canonicalise() had .replace(/\.(\d)/g, '.$1') which was a mathematical identity (no transform — replacing dot+digit with dot+digit). Removed; replaced with clarifying comment.
- package.json — new audit:blurbs:content npm script.
Rubber-duck dispatched BEFORE implementing (10 findings; 1 BLOCKING adopted + 4 non-blocking adopted + 4 suggestions adopted + 1 minor noted):
- BLOCKING #1 (adopted): removing bare curie/davinci/babbage/bison without explicit text-curie-001-style handling permanently loses detection of those forms. Fixed by adding both the prefixed and (in Pass-2) the bare-suffixed regex branches with explicit LEGACY entries.
- #2 (adopted): drop sonnet-4 family marker — would globally suppress future stale Sonnet 4 family refs. Sidecar the 3 specific lines instead.
- #3 (adopted): sidecar file:line:token triple is brittle (line numbers shift); add context substring as the strict guard. Line stays for finding the entry but is informational only.
- #4 (adopted): 90-day staleness is too aggressive for historical citations; switch to reviewAfter per entry, default 1 year.
- #5 (adopted): hard-fail on malformed JSON; missing file = empty, not error.
- #6 (adopted): sidecar at repo root, non-hidden, committed.
- #7 (adopted): manual editing only; no helper CLI.
- #8 (adopted): one-off tooling-SME prompts this session (not 4th template). Formalise only after 2+ tooling sessions reuse the same prompt structure.
- #9 (adopted): document --scope=content semantics explicitly (default scope unchanged 12 files; content-scope advisory-only; sidecar applies to both scopes; strict only blocks default-scope LEGACY).
Pass-1 tooling-SME (24/27 verified — full ollama.com/library + openai-python HEAD + Google Vertex AI versioning + AI Studio deprecations cross-check; 0 P0/P1, 3 P2):
- ✓ All 13 LEGACY text-ID entries traced through the regex with full match coverage.
- ✓ Ollama tags (llama3.2, llama3.1, qwen2.5) canonical per ollama.com/library.
- ✓ OpenAI legacy completions models shutdown dates correct (text-davinci family: Jan 4 2024; Codex family: Mar 23 2023).
- ✓ Sidecar schema validation traced — all 11 validation rules hard-fail correctly.
- ✓ Backward-compatible (canonicalise/variants/classify unchanged; HUMAN_RE unchanged; inline annotation still works).
- P2 C1.4 (DEFERRED to Session 7): gemma-\d+ regex has mandatory hyphen unlike sibling llama-?\d+ / qwen-?\d+ — gemma3 and gemma4 Ollama tags are undetectable. Plus future-proofing additions for qwen3, qwen3.5, llama3.3 not in KNOWN_CURRENT.
- P2 C2.5 (DEFERRED): Vertex AI bare text-bison/chat-bison (no -001 suffix) undetectable. Low priority since Claw scope is AI Studio not Vertex AI.
- P2 C2.7 (ESCALATED IN PASS-2): davinci-002/babbage-002 undetectable. Pass-2 elevated to P1 (script comment misrepresented as covered) and FIXED IN THIS COMMIT — added bare-suffixed regex branch + explicit LEGACY entries.
Pass-2 tooling-SME (sibling-script coupling + strict-mode invariant + deploy/CI impact + backward-compat — 1 P1 + 4 P2; P1 + 2 P2 adopted, 2 P2 deferred):
- Sweep 1 ✓ 0 sibling-script coupling concerns (voice-lint/audit-claims/audit-verification/integrity-check all clean — no embedded model lists, no API_ID_RE-style regex, no --scope= flags).
- Sweep 2 ✓ Strict-mode invariant verified — contentLegacyHits is exclusively in printing blocks, never in any exit condition. --strict --scope=content with 100 LEGACY hits in content STILL exits 0.
- Sweep 3 ✓ .gitignore doesn't catch the sidecar (committed correctly). deploy.mjs walk(DIST) excludes repo-root JSON (sidecar correctly NOT in Cloudflare manifest — it's a build-time tool config, not a production asset).
- Sweep 4 found:
- P1 ADOPTED: davinci-002/babbage-002 structural undetectability. Comment claimed all real legacy citations use the prefixed form — incorrect. Fixed via new regex branch + LEGACY entries (above).
- P2-TOOLING-01 ADOPTED: integrity.yml gate-2 comment now mentions both escape hatches.
- P2-TOOLING-03 ADOPTED: removed no-op .replace(/\.(\d)/g, '.$1') from canonicalise().
- P2-TOOLING-02 DEFERRED: Ollama tag second-source verification against GitHub repo (Pass-1 used ollama.com/library; Source B locked for Session 7 — Ollama GitHub manifest or CLI reference).
- P2-TOOLING-04 DEFERRED: walkContent() called twice when --scope=content active (once for scan, once for unused-entry filter). Minor inefficiency; cache for Session 7.
- Sweep 5 ✓ All 5 backward-compat checks preserved.
Lessons captured (apply to Session 7+):
- Pass-2 tooling-SME catches what Pass-1 doesn't — Pass-1 found the davinci-002/babbage-002 coverage gap as a P2 "acceptable trade-off"; Pass-2 escalated to P1 by reading the script's own comment block and noticing the comment INCORRECTLY claimed all legacy citations use the prefixed form. Pass-1 verified vendor sources; Pass-2 verified internal consistency between code and its own documentation. Both passes have distinct value for tooling-design SME just as they do for content.
- Trade-offs need to be DOCUMENTED in the code, not just in the SME report. The "we accept this gap because false-positives are worse than missed-positives" decision can be totally correct AND still ship as P1 if it's not written down in the code. Future maintainers can't read SME reports — they read the code.
- One-off tooling-SME prompts work well at session-state scope. Two adaptations from sme-pass-1/2.md (changed "anti-patterns: do not propose tooling-level changes" → "this IS a tooling-level review"; added regex-coverage tracing + sibling-script-coupling sweep). Total prompt-authoring cost ~3 minutes. Lower process debt than extending the templates would have been.
- Defer cleanly when SME flags P2. Pass-1 surfaced gemma-regex inconsistency and Vertex AI bison forms; Pass-2 surfaced walkContent caching and Ollama second-source verification. All deferred to Session 7. The discipline: don't expand session scope mid-implementation just because the SME found something. Adopt only what BLOCKS the current session's correctness.
~~Phase 1.1 Session 7 candidates (queued by Session 6)~~ — ✅ DONE 16 May 2026 PM3 (see Session 7 entry below)¶
Session 7 (audit-blurbs polish cluster) — ✅ DONE 16 May 2026 PM3¶
Shipped in single commit a0e0893 (1 file · 64 insertions · 7 deletions). 5 CI gates green locally + GHA Auto-deploy + Integrity check both green on first push (confirms portability on Ubuntu Node 20). Live at https://claw.aguidetocloud.com; live smoke 45/45 + 5 vendor-hub spot-checks (openclaw/anthropic/openai/google/microsoft) all HTTP 200. 9-batch + 3-tooling-session two-pass-SME streak intact (Sessions 5 = pure infra + single pass; Sessions 6 + 7 = tooling + two pass).
Edits across 1 file (scripts/audit-blurbs.mjs):
-
Fix C1.4 —
gemma-?\d+(?:\.\d+)?regex consistency. Wasgemma-\d+(?:-[a-z0-9]+)*(mandatory hyphen). Nowgemma-?\d+(?:\.\d+)?(?:-[a-z0-9]+)*— matches siblingllama-?\d+(?:\.\d+)?/qwen-?\d+(?:\.\d+)?shape. Catches Ollama no-hyphen tag form (gemma3,gemma4) and forward-proofs forgemma3.5. -
Fix C2.5 — Bison-only suffix-optional in prefixed-legacy branch. Was
(?:text|code|chat|codechat)-(?:davinci|curie|babbage|ada|bison|cushman)-\d{3}(all roots mandatory-\d{3}). Now(?:text|code|chat|codechat)-(?:(?:davinci|curie|babbage|ada|cushman)-\d{3}|bison(?:-\d{3})?). Bison gets optional suffix because Vertex AI canonical retired IDs ARE bare (text-bison,chat-bison). OpenAI/Codex roots keep mandatory-\d{3}to avoid INFO noise on incomplete partials liketext-davinci(per rubber-duck #2 — initial proposal made suffix optional for ALL roots; narrowed after duck flagged the noise risk). Bare branch(?:davinci|curie|babbage|ada|cushman)-\d{3}unchanged. -
LEGACY_MODELS additions.
text-bison,chat-bison,code-bison,codechat-bison. Verified via Google Cloud model-versioning page (text-bison + chat-bison retired April 21 2025) and googleapis/python-aiplatform SDK at HEAD (code-bison + codechat-bison via CodeGenerationModel + CodeChatModelfrom_pretrained()calls).classify()'s suffix-strip fallback does NOT cover bare bison forms (peels totextthenchat, neither in any set), so explicit entries are required. -
KNOWN_CURRENT_MODELS additions.
gemma3,gemma4(Ollama no-hyphen tag forms, paralleling the existingllama3.2/qwen2.5pairs). Verified against TWO sources: ollama.com/library/+ ollama/ollama GitHub README at HEAD (both show ollama run gemma3andmodel='gemma3'in canonical Python/JS examples).gemma4confirmed RELEASED — ollama.com/library/gemma4 is live with multiple variants (gemma4:e2b,gemma4:e4b,gemma4:26b,gemma4:31b,gemma4:31b-cloud). Forward-lookingqwen3/qwen3.5/llama3.3deliberately NOT pre-listed per rubber-duck #1 — KNOWN_CURRENT means verified-current, not plausible-future. Also addedllama4(Pass-2 P2-TOOLING-01 — was missing despitellama-4being present; closes two-forms convention gap). -
walkContent() result caching (P2-TOOLING-04 from Session 6). Was called twice when
--scope=contentactive. Nowconst contentPaths = INCLUDE_CONTENT ? await walkContent() : [];computed once at module scope (after walkContent declaration + sidecar load); reused at scan loop + sidecar in-scope filter. Empty-array fallback ensures behavioral equivalence when flag absent. -
Comment block updates. Header revision note (Session 7 changes listed). LEGACY_MODELS Bison carve-out rationale + Vertex AI
@001vs OpenAI-001versioning note (Pass-1 P2 #5 — Vertex AI canonical uses@001at-sign; caught via bare LEGACY entry because@is a\bboundary). KNOWN_CURRENT_MODELS Ollama two-source verification note (cites both sources). API_ID_RE comment block now documents the Bison/OpenAI split rationale.
Rubber-duck dispatched BEFORE implementing (11 findings; all adopted):
- #1 (adopted): don't pre-list unreleased qwen3/qwen3.5/llama3.3 — KNOWN_CURRENT means verified-current, not plausible-future.
- #2 (adopted, critical): narrow Bison regex change to Bison only. Initial proposal made suffix optional for ALL prefixed roots — would have created new INFO noise for partials like text-davinci/code-cushman/text-curie. Special-case Bison only.
- #3 (note adopted): ship gemma regex; add inline smoke tests.
- #4 (adopted): only verified gemma3/gemma4 to KNOWN_CURRENT.
- #5 (note adopted): explicit bare Bison LEGACY entries required (suffix-strip won't cover).
- #6 (adopted): const contentPaths = INCLUDE_CONTENT ? await walkContent() : []; AFTER walkContent declaration + sidecar load, not top-of-file let.
- #7 (adopted): keep --scope=content advisory; do not promote to CI.
- #8 (adopted): defer SME-template formalisation; only 2nd tooling session.
- #9 (adopted): focused smoke-tests before/after edits.
- #10 (adopted): LAST_VERIFIED stays 2026-05-16; header revision text updated.
- #11 (adopted): phrase Ollama "second source" verification as "confirmed against both ollama.com/library and ollama/ollama GitHub README at HEAD".
Inline smoke-test (smoke-audit-blurbs.mjs in session-state/files/, duck #9):
17/17 PASS — appended a 25-line fixture to src/pages/openclaw/index.astro (under restore-on-finally pattern), ran audit-blurbs in --warn mode, parsed findings against expected classifications:
- LEGACY: text-bison, chat-bison, code-bison, codechat-bison, davinci-002, babbage-002, text-davinci-003, PaLM, PaLM 2 — all caught ✓
- NOT MATCHED: text-davinci (incomplete), code-cushman (incomplete), Marie Curie, Charles Babbage, Leonardo da Vinci (prose) — all silent ✓
- CURRENT (silent, no findings): gemma-3, gemma3, gemma4, llama3.2, qwen2.5 ✓
Pass-1 tooling-SME (vendor-source verification — one-off prompt adapted from sme-pass-1.md; 0 P0 / 0 P1 / 3 P2, all 15 regex coverage trace cases PASS):
- ✓ Ollama no-hyphen tag form confirmed via two independent sources (library page + GitHub README HEAD).
- ✓ Vertex AI bare text-bison/chat-bison confirmed canonical retired model IDs per Google's own lifecycle page (retired 21 April 2025, upgrade target gemini-2.0-flash-lite).
- ✓ code-bison/codechat-bison confirmed via googleapis/python-aiplatform SDK (CodeGenerationModel.from_pretrained("code-bison@001") / CodeChatModel.from_pretrained("codechat-bison@001")).
- ✓ gemma4 confirmed RELEASED — ollama.com/library/gemma4 is live with size variants (2B/4B/26B/31B). Pre-listing concern does not apply.
- ✓ OpenAI mandatory-suffix design for davinci/cushman correct — openai-python HEAD only exposes davinci-002/babbage-002 in Completions API Literal.
- ✓ 15/15 regex coverage trace PASS (gemma3 → CURRENT, gemma-3 → CURRENT, gemma3:4b → CURRENT, gemma3.5 → INFO, text-bison → LEGACY, text-bison-001 → LEGACY, chat-bison → LEGACY, text-davinci → NO MATCH, text-davinci-003 → LEGACY, code-cushman → NO MATCH, code-cushman-001 → LEGACY, davinci-002 → LEGACY, Marie Curie → NO MATCH, Leonardo da Vinci → NO MATCH, Charles Babbage → NO MATCH).
- P2 #5 (ADOPTED): bison -001 comment said "canonical API-ID forms"; corrected to note Vertex AI uses @001 (at-sign), OpenAI hyphen convention conflation. Functional behaviour unchanged.
- P2 #7 (DEFERRED to Session 8): gemma3n family undetectable (\b fails between digit 3 and letter n — both are \w).
- P2 deferred item: code-bison/codechat-bison not explicitly on Google Cloud model-versioning page (page scopes to "Gemini and embedding models"). SDK confirmation is authoritative; canonical-doc citation deferred.
Pass-2 tooling-SME (internal consistency + sibling coupling + strict-mode invariant + backward-compat — adapted from sme-pass-2.md; 0 P0 / 0 P1 / 1 P2 + 2 obs):
- ✓ Sweep 1 sibling-script coupling: 0 hits across 6 sibling scripts (voice-lint, audit-claims, audit-verification-states, integrity-check, smoke-check, deploy). No LEGACY_MODELS/KNOWN_CURRENT_MODELS/API_ID_RE/sidecar/--scope= references anywhere else.
- ✓ Sweep 2 strict-mode invariant: contentLegacyHits never appears in process.exit(1) condition (line 771 only checks legacyHits). New LEGACY entries do not regress strict mode — grep of 12 default-scope files for bison|gemma|llama4|qwen3|llama3\.3 returns 0 hits.
- ✓ Sweep 3 sidecar interaction: all 5 existing entries traced and still suppressing correctly under new regex/classification.
- ✓ Sweep 4 code-vs-comment consistency: all 4 comment blocks verified (header + LEGACY_MODELS Bison carve-out + KNOWN_CURRENT_MODELS Ollama + API_ID_RE Bison/OpenAI split). qwen3/llama3.3 INFO claim verified by trace.
- ✓ Sweep 5 backward-compatibility: inline audit-blurbs:ignore, sidecar context-substring guard, --strict exit semantics, --scope=content advisory-only — all 4 contracts preserved. claude-sonnet-4-6/gpt-5.5/gemini-3.1-pro/llama3.2/qwen2.5 all still classify as CURRENT. gpt-4o-audio-preview suffix-strip → gpt-4o still works.
- P2-TOOLING-01 ADOPTED (1-line fix in same commit): llama4 no-hyphen form was missing while llama-4 was present — broke the two-forms convention the comment explicitly establishes. Added 'llama4' to KNOWN_CURRENT_MODELS.
- P2-TOOLING-02 DEFERRED to Session 8: --scope=all alias supported at line 75 but undocumented in header Modes block.
- P2-TOOLING-03 DEFERRED to Session 8: HUMAN_RE has no Gemma\s+\d+ branch — "Gemma 4" prose in updates.ts + comparisons.ts (both default-scope) is invisible to audit. Coverage gap, not false positive risk.
Lessons captured (apply to Session 8+):
- The two-forms Ollama convention is now a maintenance contract. Every Ollama-tagged family in KNOWN_CURRENT_MODELS gets BOTH the hyphenated canonical AND no-hyphen tag form listed. Pass-2 caught llama4 missing because the comment explicitly established the convention. Future additions: always add both forms; the Pass-2 sibling sweep will catch single-form omissions.
- Rubber-duck #2 caught a real over-broadening. Initial Fix-2 proposal made (?:-\d{3})? optional for ALL prefixed legacy roots. Would have introduced new INFO noise for partials like text-davinci/code-cushman/text-curie. Duck narrowed to Bison-only with the alternation pattern (?:(?:davinci|curie|babbage|ada|cushman)-\d{3}|bison(?:-\d{3})?) — keeps OpenAI strictness, relaxes only where Vertex AI convention demands it. This is exactly the "design-stage critique" win from Session 5 generalised.
- Vendor convention asymmetry matters in regex design. OpenAI uses -\d{3} hyphen-suffix versioning. Vertex AI uses @001 at-sign versioning AND canonical bare forms (no suffix at all). A regex that pretends both vendors follow the same convention either over-matches (Session 6 pre-fix bare branch) or under-matches (Session 7 pre-fix prefixed branch). The Bison-only carve-out pattern is reusable for any future per-vendor asymmetry.
- Pre-listing unreleased models is a false-signal trap (duck #1). KNOWN_CURRENT means "verified current", not "plausible future". When unreleased models surface as INFO, that IS the audit working as intended — surfacing for human review and explicit add to the right list. Don't pre-add to suppress.
- Tooling-SME prompts can be safely informal at session-state scope for now. Two tooling sessions (6 + 7) have reused the adapted sme-pass-1/2.md structure successfully. Session 6's lesson "defer formalisation until 2+ tooling sessions reuse" was the right call — Session 7 was the 2nd, and the prompts adapted cleanly. Session 8 candidates include formalisation IF a 3rd tooling session ships.
~~Phase 1.1 Session 8 candidates (queued by Session 7)~~ — ✅ DONE 16 May 2026 PM4 (see Session 8 entry below)¶
Session 8 (audit-blurbs polish closure + tooling-SME template formalisation) — ✅ DONE 16 May 2026 PM4¶
Shipped in TWO commits per Session 8 rubber-duck #7 (split clean for review/rollback):
- Commit 1 8dbca28 (1 file · +42/-7) — audit-blurbs polish (Tracks A+B+C).
- Commit 2 a43fac0 (3 files · +295/-3) — tooling-SME template formalisation (Track D).
5 CI gates green locally + GHA Auto-deploy + Integrity check both green on first push (single GHA run covered both commits). Live smoke 45/45 + 5 vendor-hub spot-checks (openclaw/anthropic/openai/google/microsoft) all HTTP 200. 9-batch + 4-tooling-session two-pass-SME streak intact (Sessions 5 single-pass infra; Sessions 6 + 7 + 8 tooling two-pass — Session 8 hits 3-session reuse threshold and ships the formalised templates in same session).
Commit 1 — audit-blurbs polish (3 deferred Session 7 P2s closed):
-
Track A —
--scope=allalias removed (Session 7 Pass-2 P2-TOOLING-02). Line 75 wasINCLUDE_CONTENT = process.argv.includes('--scope=content') || process.argv.includes('--scope=all');. Now simplyprocess.argv.includes('--scope=content'). Verified zero stale references acrosspackage.json,README.md,.github/workflows/, and entire repo grep — only remainingscope=allmention is the changelog comment in the header revision note (correct). -
Track B — HUMAN_RE Gemma prose branch added (Session 7 Pass-2 P2-TOOLING-03). New
Gemma\s+\d+(?:(?:\.\d+)|n)?branch between Gemini and PaLM. Catches "Gemma 4 enabled by default" prose insrc/data/updates.ts:44,46andsrc/data/comparisons.ts:693(both default-scope files previously invisible to audit). Tight n-suffix grouping mirrors Bison-style alternation from Session 7 — accepts dotted-decimal OR singlensuffix but not both stacked. -
Track C — gemma3n family detection (Session 7 Pass-1 P2 #7). API_ID_RE gemma branch broadened from
gemma-?\d+(?:\.\d+)?(?:-[a-z0-9]+)*togemma-?\d+(?:(?:\.\d+)|n)?(?:-[a-z0-9]+)*with the same tight grouping.gemma3n+gemma-3nadded to KNOWN_CURRENT_MODELS (two-forms Ollama convention from Session 7 lesson). Verified against THREE sources: ollama.com/library/gemma3n + 9 published tags (gemma3n:e2b, gemma3n:e4b, instruction-tuned + quantised variants), huggingface.co/google/gemma-3n-e4b-it (HF/API-ID form), ai.google.dev/gemma/docs/gemma-3n (vendor docs). Selective-parameter-activation family at 2B/4B effective sizes for on-device use. -
Comment block updates. Header revision note for Session 8. KNOWN_CURRENT_MODELS gemma comment block extended with Session 8 gemma3n rationale + forward-looking-not-pre-listed note for
qwen3/llama3.3/gemma5/gemma4n. API_ID_RE comment block documents tight n-suffix grouping. HUMAN_RE gains its own comment paragraph for the Gemma branch. Pass-1 P2 #6 adopted in same commit: comment originally claimedgemma3.5nwould be NOT MATCHED; Pass-1 trace showed regex backtracks the optional alternation to empty after the\bboundary fails on.5nand extracts partial matchgemma3(→ KNOWN_CURRENT silent). Updated both API_ID_RE and HUMAN_RE comments to accurately describe the backtrack behaviour (acceptable sincegemma3.5nis not a real family form as of 2026-05).
Commit 2 — Tooling-SME template formalisation (Track D — 3rd reuse threshold):
docs/sme-templates/sme-pass-1-tooling.md(~150 LOC) — Pass-1 tooling template. Forks content sme-pass-1.md structure, inverts "do not propose tooling-level changes" anti-pattern. Three Pass-1-tooling sweeps: (1) regex coverage trace table BLOCKING with ≥15 positive/negative/edge cases; (2) vendor-source primary citation per list addition; (3) documentation consistency. Cites Session 6/7/8 lessons inline as canonical examples.docs/sme-templates/sme-pass-2-tooling.md(~230 LOC) — Pass-2 tooling template. Forks content sme-pass-2.md structure. Five Pass-2-tooling sweeps: (1) sibling-script coupling grep; (2) strict-mode invariant + exit-condition trace; (3) sidecar/suppression interaction; (4) code-vs-comment consistency (HIGHEST-VALUE — Session 6 lesson, Pass-2-defining axis); (5) backward-compatibility contracts. Thedavinci-002/babbage-002coverage gap (Session 6) andllama4missing form (Session 7) cited as canonical examples of Pass-2-tooling catching what Pass-1 doesn't.docs/sme-templates/README.md— Updated table to include both tooling templates. Replaced obsolete "These are not a tooling-design SME" disclaimer with "Content vs tooling SME — pick the right pair" section explaining when to use each pair.
Rubber-duck dispatched BEFORE implementing (9 findings; all adopted):
- #1 (critical adopt): Track C regex use tight (?:(?:\.\d+)|n)? alternation, NOT broad [a-z]* tail. Broad tail would match foo-gemma3bar as gemma3bar (false-positive class). Tight grouping rejects the bad cases at regex level.
- #2 (adopt): Track B HUMAN_RE grouping mirrors Track C alternation — prevents weird Gemma 3.5n outer-token match.
- #3 (adopt): exactly two KNOWN_CURRENT entries (gemma3n + gemma-3n) — variants() does NOT bridge no-hyphen and hyphen forms, so both required.
- #4 (note): Track A removal safe — no documented intent for --scope=all beyond undocumented synonym.
- #5 (adopt): don't move Gemma 1/2 to LEGACY this session — no concrete visible-citation policy. Let them surface as INFO.
- #6 (adopt): Track D forked-structure templates (not standalone). Mirrors existing templates' contract; replaces content-specific anti-patterns with tooling-specific sweeps.
- #7 (adopt): split Track A+B+C and Track D into TWO commits. Cleaner review/rollback boundary. Audit-blurbs polish + SME templates are heterogeneous.
- #8 (note): sidecar impact nil — Gemma 4 already in KNOWN_CURRENT; HUMAN_RE addition just makes prose silent (was invisible before, now classified silent).
- #9 (suggest adopt): keep smoke harness as session-state one-off (smoke-audit-blurbs-s8.mjs in session files/); don't commit scripts/test-audit-blurbs.mjs until a stable fixture matrix justifies it as a CI surface.
Inline 17-case smoke harness (duck #9): all PASS. Covers Track B HUMAN_RE Gemma classifications (Gemma 4, Gemma 3, Gemma 3n → CURRENT silent; Gemma 2 → INFO advisory), Track C gemma3n + gemma-3n forms, Session 7 regression (text-bison/chat-bison/davinci-002 → LEGACY; Marie Curie/Leonardo da Vinci/text-davinci → NO MATCH; llama3.2/qwen2.5/gemma3/gemma4 → CURRENT silent).
Pass-1 tooling-SME (3rd reuse — adapted one-off prompts; templates formalised same session in Commit 2): 0 P0 / 0 P1 / 3 P2. 13/13 regex coverage trace cases PASS. gemma3n confirmed real Ollama family with 9 published tags. gemma-3n confirmed HF canonical form. Google DeepMind, Hugging Face, and Ollama all agree on dual-form naming. gemma3.5 and gemma4n correctly classified as INFO (no such models — forward-proofing gaps acceptable). P2 #6 (gemma3.5n comment accuracy) adopted in same commit. Verdict YELLOW (one P2 fix in same commit; no functional defects).
Pass-2 tooling-SME (3rd reuse): 0 P0 / 0 P1 / 3 informational P2 observations only. All 5 sweeps clean:
- ✓ Sweep 1: 0 sibling-script coupling across 11 sibling .mjs files. Zero gemma3n/Gemma/--scope=all external references.
- ✓ Sweep 2: MODE === 'strict' exit path only gates on legacyHits; contentLegacyHits never in exit condition. Empirical trace: Gemma 4 in updates.ts + comparisons.ts classifies CURRENT silent → 0 hits in default scope.
- ✓ Sweep 3: all 5 existing sidecar entries re-traced under new regex; all still suppress correctly. No new default-scope hits requiring sidecar entries.
- ✓ Sweep 4: all 4 comment blocks (header revision + KNOWN_CURRENT gemma + API_ID_RE Session 8 + HUMAN_RE) accurately describe code. No stale line-number references anywhere.
- ✓ Sweep 5: all 6 prior contracts preserved (Session 4 inline-ignore + strict exit; Session 6 sidecar context guard + content advisory; Session 7 two-forms Ollama convention). All gemma pairs present (3 hyphenated + 3 no-hyphen). Spot-checked 8 CURRENT + 5 LEGACY entries — all classify correctly.
Verdict GREEN.
Lessons captured (apply to Session 9+):
- Forked-structure templates outperform full standalone or generic shared. Track D's sme-pass-{1,2}-tooling.md borrow the contract shape from content templates (same output format, same dispatcher mental model) but replace content-specific axes (sibling-page coupling, adjacent-cell scan, cloud-platform feature scope) with tooling-specific axes (sibling-script coupling, regex coverage trace, strict-mode invariant, code-vs-comment, backward-compat). Standalone would have duplicated structural decisions; generic shared would have preserved the wrong anti-pattern ("do not propose tooling-level changes"). The fork is the right level of abstraction.
- Split commits when changes are heterogeneous. Tracks A+B+C (script polish) and Track D (documentation/process) are different review surfaces. Combining them in one commit makes review noisier and rollback riskier. The split-commit pattern is reusable: when one session ships both code AND process/docs, prefer two clean commits over one mixed.
- Pass-1 caught a code-vs-comment issue this session — Pass-2 axis traditionally. Session 6 + 7 had Pass-2 catch code-vs-comment drift while Pass-1 verified vendor sources. Session 8 Pass-1 caught the gemma3.5n backtrack-vs-comment mismatch as part of its regex-coverage trace. Both passes can find code-vs-comment issues — Pass-1 via the trace table revealing actual regex behaviour, Pass-2 via comment block reading. Don't assume one pass owns the axis exclusively; the trace table is itself a code-vs-comment consistency check.
- The 3-session formalisation threshold paid off. Session 5 duck said "defer formalisation until 2+ tooling sessions reuse". Session 6 was 1st, Session 7 was 2nd, Session 8 was 3rd — the adapted prompts converged on a stable shape, and the formalised templates capture that stability cleanly. The threshold prevents premature abstraction (had we formalised after Session 6, the templates would have been single-data-point guesses); it also prevents over-deferral (3 sessions is enough signal). Reusable pattern: don't formalise at first reuse; don't defer past 3rd.
- Tight regex alternation > broad character classes (duck #1 / #2). (?:(?:\.\d+)|n)? over [a-z]* reduces false-positive surface area substantially. This pattern (alternation in optional non-capturing group) is the Bison-style template from Session 7 — now confirmed reusable for any per-vendor naming-convention asymmetry. Add to the template library when a 3rd vendor needs it.
~~Phase 1.1 Session 9 candidates (queued by Session 8)~~ — ✅ DONE 16 May 2026 PM4 (see Session 9 entry below — Tracks 1, 2, 4 shipped; Track 3 cut per duck)¶
Session 9 (cross-link audit + voice polish) — ✅ DONE 16 May 2026 PM4¶
Shipped in TWO commits per Session 8's split-commit lesson + Session 9 rubber-duck #7:
- Commit 1 b630bc0 (7 files · +97/-7) — Track 1 cross-link audit + Pass-2 ⚠ adoption + pre-existing azure.mdx slug leftover fix.
- Commit 2 119b0d0 (2 files · +4/-3) — Tracks 2 + 4 pure code (Header empty-state + numeric sort).
GHA Auto-deploy + Integrity check both green on first push for both commits. Live smoke 45/45 + 6 spot-checks (5 vendor hubs + /compare/) all HTTP 200. 9-batch + 4-tooling + 1-content-session two-pass-SME streak intact. Track 3 (/openai/api/ page) was DEFERRED per rubber-duck #3 BLOCKING — would create broken href stub OR contradict 7.4's explicit OpenAI exclusion.
Commit 1 — Cross-link audit (Track 1):
-
src/components/VendorHub.astro— newCompareLinkinterface + optionalcompareLinks?: CompareLink[]prop. Renders as.vh-compare-linkssub-section in the banner after.vh-meta, visually separated from blurb narrative (border-top, mono font, scannable§ X.Ylabels + contextual one-liner). Per duck #1+#2: structured field over inline blurb prose (better auditability + scannability than embedding<a>tags invendor.blurbHTML strings). -
5 vendor hub
index.astrofiles —compareLinkspopulated: openclaw/index.astro→ § 7.1 (openclaw-vs-mcp-stacks). 1 link.anthropic/index.astro→ § 7.2 (cli-coding-agents — Claude Code) + § 7.4 (direct-model-apis — Claude API). 2 links.openai/index.astro→ § 7.2 ONLY (Codex CLI). Deliberately excludes § 7.4 because the page explicitly puts OpenAI direct API out of scope ("Foundry covers the Azure-governed path to GPT-5"). Per duck #3 BLOCKING — don't link a hub to a compare where the vendor is explicitly excluded.google/index.astro→ § 7.2 (Gemini CLI) + § 7.4 (Gemini API). 2 links.-
microsoft/index.astro→ § 7.2 (Copilot CLI) + § 7.3 (M365 paths) + § 7.4 (Foundry, with Azure-governed-platform qualifier per Pass-2 ⚠ framing tightening). 3 links. -
Pass-2 ⚠ adoption (Microsoft § 7.4 framing tightening): Original title was
'Direct model APIs — Foundry (vs Claude API vs Gemini API)'. Pass-2 traced into the compare body and surfaceddirect-model-apis.mdx:58: "Microsoft Foundry — not really a model API. It's a platform (PaaS)." Tightened to'Direct model APIs — Foundry (Azure-governed platform, vs Claude API vs Gemini API)'so the hub label reflects the compare body's own framing. Pass-2's semantic finding supersedes Pass-1's cosmetic format-inconsistency P2. -
Pre-existing leftover fix (
src/content/setups/azure.mdx:152): Surfaced by Pass-2 adjacent-page sweep — line read[§7.1 OpenClaw vs MCP-based stacks](/compare/)with bare-/compare/href instead of the full slug. Not introduced by Session 9 — pre-existing slip. Fixed in same commit (5-second change, resolves a content-tree inconsistency).
Commit 2 — Code polish (Tracks 2 + 4):
-
src/components/Header.astro(Track 2): Replacedconst dateText = lastUpdated ?? '—';with conditional Astro render{lastUpdated?.trim() && <span>...</span>}. The whole.meta-updatedspan (green dot + label + date) is hidden whenlastUpdatedis undefined. Voice-duck Session 2 P2 alternative. Verified: home + 5 vendor hubs (no lastUpdated) correctly omit meta-updated; compare detail pages (have lastReviewedAt) still render it. Other.header-metaitems (edit on GitHub, colophon, theme toggle) unaffected. -
src/pages/compare/index.astro:10(Track 4): Added, undefined, { numeric: true }tolocaleCompare. Sorts "7.10" > "7.2" naturally (was lexicographic — "7.10" < "7.2" because '1' < '2'). Node 20+ supported. Future-proofing — bites only when 10+ compare sections exist (currently 4). All current sections (7.1, 7.2, 7.3, 7.4) still sort correctly.
Rubber-duck dispatched BEFORE implementing (11 findings; 9 adopted, 1 BLOCKING-cut, 1 SUGGEST adopted):
- #1 (adopt): structured compareLinks field over inline blurb prose.
- #2 (adopt): place section in banner after .vh-meta, not after Products grid (above-the-fold but visually distinct from narrative).
- #3 (BLOCKING — CUT Track 3): defer /openai/api/ entirely. Would create broken href stub OR force a contradictory 7.4 rewrite.
- #4 (adopt): keep OpenAI excluded from 7.4 — sticking with the plan.
- #5 (adopt): reciprocal check uses title + description + against, not against alone. Several compares put one side in title.
- #6 (note): mappings substantively valid; Microsoft 7.4 phrased as Foundry-specific.
- #7 (adopt): Header conditional + ?.trim() polish.
- #8 (adopt): localeCompare with numeric: true (Option C — concise, Node 20 supported).
- #9 (adopt): SME only on Track 1 (content); skip for Tracks 2+4 (pure code).
- #10 (suggest adopt): inline-only <a> if staying in blurb prose; structured field avoids the issue (taken via #1).
- #11 (adopt): smoke = build + integrity + spot-check 5 hubs + compare index + one page with lastUpdated, one without.
Content-SME Pass-1 (using formalised docs/sme-templates/sme-pass-1.md — first content-pair use since Session 8 formalisation): 0 P0 / 0 P1 / 1 P2. 9/9 cross-links content-verified. Reciprocal check passes for each via combined title+description+against (duck #5). OpenAI exclusion clean across whole tree — grep openai.*direct-model returns 0 matches. P2 was cosmetic label format inconsistency (Microsoft em-dash vs other hubs' parenthetical) — superseded by Pass-2's semantic finding below. Verdict GREEN.
Content-SME Pass-2 (using formalised docs/sme-templates/sme-pass-2.md): 9 verified by Source B (compare body + product MDX files); 0 wrong; 1 ⚠ partially correct (framing tightening); 1 sibling-page leftover. Pass-2 escalation pattern in action:
- Pass-1 saw Microsoft 7.4 label as cosmetic format-inconsistency (em-dash style vs other hubs' parenthetical).
- Pass-2 traced INTO the compare body and found direct-model-apis.mdx:58 explicitly says "Microsoft Foundry — not really a model API. It's a platform (PaaS)". The label was semantically misleading by calling Foundry a "model API" without qualification.
- Pass-2's semantic finding supersedes Pass-1's cosmetic one — adopted "(Azure-governed platform, ...)" qualifier in same commit.
- Pass-2 sibling-page sweep also found setups/azure.mdx:152 with bare /compare/ href — pre-existing leftover, fixed in same commit.
Verdict GREEN. 9-batch + 4-tooling + 1-content-session two-pass-SME streak intact.
Lessons captured (apply to Session 10+):
- Pass-2 catches semantic findings Pass-1 misses by tracing INTO compare bodies (not just frontmatter). Pass-1 worked off title + description + against and saw the Microsoft label as cosmetic. Pass-2 read the compare body's own qualifier ("not really a model API. It's a platform (PaaS)") and saw a semantic mismatch. Same Pass-2-deeper-than-Pass-1 dynamic as Sessions 6 + 7 + 8, just on content axis instead of tooling. The pattern generalises: Pass-1 verifies surface claims; Pass-2 verifies the deep claims those surface claims rest on.
- Adjacent-page sweep finds pre-existing leftovers that no individual session would have caught. setups/azure.mdx:152 had a bare /compare/ href for an entire session-cycle (Session 0b URL migration didn't pick it up). Pass-2's adjacent-page grep across all /compare/ outbound links is institutional housekeeping — keep doing it every content-SME session.
- Cut > defer when scope would create broken state. Duck #3 BLOCKING saved the session — adding /openai/api/ as a "stub" would either ship a clickable broken href OR force a 7.4 rewrite that wasn't planned. Cleaner to cut entirely and revisit when real content is ready.
- Split commits when heterogeneous (Session 8 lesson reused). Content commit + code commit kept each surface reviewable independently. Session 9 confirmed the pattern works the second time.
~~Phase 1.1 Session 10 candidates (queued by Session 9)~~ — ✅ DONE 16 May 2026 PM5 (B2 path — /openai/api/ scope-lock shipped; see Session 10 entry below)¶
Session 10 (B2 /openai/api/ scope-lock) — ✅ DONE 16 May 2026 PM5¶
Shipped as single commit 93b2f60 (3 files · +6/-6 wording deltas across 8 distinct edits). GHA Auto-deploy + Integrity check both green on first push (1m23s + 41s). Live smoke 45/45 + 3 spot-checks HTTP 200 + 11/11 content-render checks PASS (new wording rendering live; old "we'll add"/"yet" deferral language confirmed gone). 9-batch + 4-tooling + 2-content-session two-pass-SME streak intact.
Scope: Graduated Session 9 duck-#3 BLOCKING deferral from "we'll add later" to "decided: out of scope; published external pointer to platform.openai.com/docs". Locked in 3 places that must remain consistent: §7.4 compare page MDX + comparisons.ts data intro (ComparisonMatrix render) + /openai/ vendor hub blurb.
Single commit 93b2f60 — scope-lock (3 files):
src/content/compares/direct-model-apis.mdx(6 edits):description: tightened to "Three direct model API surfaces a production builder picks between today: Claude API · Gemini API · Microsoft Foundry. OpenAI's own API is deliberately outside Claw's scope; use Foundry for Azure-governed GPT-5, or OpenAI's docs for api.openai.com."lastReviewedAt:2026-05-15→2026-05-16.verificationNote: prefixed with "Scope-wording reviewed 16 May 2026; vendor data fetched 15 May 2026." per Pass-1 C5 P2 adoption — makes the date split explicit so future readers don't misreadlastReviewedAt: 2026-05-16as implying all Claude/Gemini/Foundry facts were re-verified today.sources:frontmatter: addedhttps://platform.openai.com/docsfor traceability of the new cited URL.- Body callout (line 45): rewritten — old final 2 sentences "We'll add an OpenAI direct-API page if that becomes a primary surface Sush works in. Until then, this is honest about scope." replaced with permanent-scope wording naming Foundry + OpenAI's docs as the two redirect paths, and explaining "A thin page that mostly repeats OpenAI's docs would add noise, not help."
- "What we are NOT claiming" (line 139): "(which we don't cover on Claw yet)" → "(which Claw deliberately doesn't cover — use OpenAI's docs for that path)".
-
"What to read next" (line 160): added
[OpenAI vendor hub on Claw](/openai/) — Agents SDK, Codex CLI, Apps SDK, Atlas, and Custom GPTs. Not OpenAI's own API; use OpenAI's docs for that.— asymmetric cross-link (Session 9 deliberately excluded the reverse direction; the asymmetry is correct because each page has different scope responsibilities). -
src/data/comparisons.ts:51introfield: ComparisonMatrix banner text aligned with new wording — "OpenAI's own API at api.openai.com is deliberately outside this comparison: Claw doesn't publish an /openai/api/ page. Use Foundry for Azure-governed OpenAI models, or OpenAI's docs for the direct API." -
src/pages/openai/index.astroblurb: extended with<br><br>block at end — "Not covered here: OpenAI's own API (api.openai.com). Use OpenAI's docs for that path; use Foundry when you need GPT-5 under Azure governance. Claw covers the five surfaces above because they're where Sush actually works."compareLinksarray UNCHANGED — Session 9's deliberate omission of §7.4 from /openai/'s outbound compareLinks remains correct.
Rubber-duck dispatched BEFORE implementing (9 findings; all 9 adopted):
- #1 P1 ADOPT: "won't be added" too absolute → "is deliberately outside this comparison" (less petulant, ages better).
- #2 P1 ADOPT: reduce platform.openai.com/docs mentions from 4 → 2 (callout + hub); elsewhere say "OpenAI's docs" — defensive repetition reads mechanical.
- #3 P1 ADOPT: less argumentative callout — replace "pretending otherwise would mean shipping a thin hub" → "A thin page that mostly repeats OpenAI's docs would add noise, not help."
- #4 P1 ADOPT: "raw OpenAI on Azure-governed infra" muddy ("raw" implies api.openai.com; Azure-governed = Foundry/Azure OpenAI; mixing them weakens distinction) — duck's shape adopted verbatim in comparisons.ts.
- #5 P1 ADOPT: add platform.openai.com/docs to sources: frontmatter for traceability.
- #6 P1 ADOPT: hub addition uses <br><br> separator NOT new VendorHub notCoveredHere prop (avoid 4th-file/component scope creep).
- #7 P2 ADOPT: less apology-heavy "What to read next" wording per duck's exact draft.
- #8 P2 ADOPT: lastReviewedAt bump justified for scope-wording review only; keep verificationNote 15-May fetch date intact (later this became Pass-1 C5 — adopted the date-split prefix in same commit).
- #9 P2 ADOPT: final audit grep before SME — confirmed only 2 intentional /openai/api/ hits, no leftover "we'll add"/"yet" anywhere in src/.
Content-SME Pass-1 (using docs/sme-templates/sme-pass-1.md — second content-pair use after Session 9): 0 P0 / 0 P1 / 1 P2 (C5 date-split adopted in-commit). 4/5 claims verified, 1 deferral (platform.openai.com/docs returned 403 to static fetch — cross-verified internally via /openai/index.astro officialUrl). All 3 sweeps clean: only 2 intentional /openai/api/ hits, no adjacent-page contradictions, 0 voice-lint matches on new wording.
Content-SME Pass-2 (using docs/sme-templates/sme-pass-2.md): 5/5 Pass-1 claims verified by independent Source B. Source B picks proven valuable (different-doc-surface discipline):
- V1 (platform.openai.com/docs canonical) — verified via openai/openai-python README at HEAD which independently routes readers to that URL as the documentation entry point.
- V2 (api.openai.com hostname) — verified by reading the SDK's _client.py:213-215 and :719-721 default base_url resolution (hardcoded fallback https://api.openai.com/v1).
- V3 (/openai/ hub covers exactly 5 surfaces) — verified by cross-checking against toolRegistry.ts and updates.ts (zero entries imply a 6th surface).
- V4 (Foundry brand currency) — verified via the developer-focused MS Learn SDK overview (a different page than Pass-1's product overview) which uses "Foundry" exclusively in SDK tables and RBAC role names.
- V5 (verificationNote date-split safe) — verified by reading Astro content schema content.config.ts:49 (z.string().optional() — open string field), YAML parsability (no unescaped quotes), and grammar.
- 0 sibling-page leftovers; 0 Pass-1-induced regressions.
Verdict GREEN. 9-batch + 4-tooling + 2-content-session two-pass-SME streak intact.
Lessons captured (apply to Session 11+):
- Source B = SDK source code (not vendor marketing doc) is a high-signal pick. When the claim is about a URL/hostname/canonical reference, the official SDK at HEAD is the most authoritative independent source — it's what consumers actually run, and the source-of-truth-by-execution. Session 10 used OpenAI Python SDK _client.py as Source B for the api.openai.com hostname claim; this should be the default Source-B picking heuristic when a "canonical" URL/hostname is at stake.
- Sibling-script vs sibling-page sweeps both pay. Session 10's Pass-2 cross-checked against toolRegistry.ts + updates.ts (independent data layers) and found zero contradictions. Sibling-DATA-FILE sweep is the content-axis analog of Session 8's sibling-script-coupling sweep. Add to the content-pair toolkit.
- Scope-decision metadata IS scope-decision content. Pass-1 C5 flagged the lastReviewedAt/verificationNote date-split as "out of scope, defer". But adding 7 words to verificationNote to make the split explicit IS scope-decision metadata, which is exactly what this session was about. Adopted in-commit (free clarity, ~30s edit, prevents future-reader confusion). Pattern: when an SME flags "out of scope, defer", check if the fix is itself in scope under a broader reading.
- Single-commit when changes serve one coherent scope decision (Session 9 split-commit lesson refined). Session 9 split because content + code were heterogeneous concerns. Session 10's three files (MDX + data + page) all serve one logical edit — the permanent scope-lock of /openai/api/ — so single commit was correct. Heuristic: split when concerns are heterogeneous AND independently reviewable; combine when concerns are tightly coupled to one decision.
~~Phase 1.1 Session 11 candidates (queued by Session 10)~~ — ✅ DONE 16 May 2026 PM5-PM6 (Priority C path — 6 /updates/ entries shipped; see Session 11 entry below)¶
Session 11 (Priority C — 5-6 /updates/ entries) — ✅ DONE 16 May 2026 PM5-PM6¶
Shipped as single commit 209c073 (3 files · 6 new entries + chronology fix + UpdateRow.astro type widening + 4 sidecar suppressions). GHA Auto-deploy + Integrity check both green on first push (1m23s + 39s). Live smoke 45/45 + 8/8 content-render checks on /updates/ page + 6/6 RSS feed checks PASS (all 6 new entries rendering live; DEPRECATED tag styling working; relocated 2026-04-17 entry preserved). 9-batch + 4-tooling + 3-content-session two-pass-SME streak intact.
Scope: Added 5 vendor RELEASE/NEWS + 1 DEPRECATED entry from the Batch E candidate pool. Pre-research agent verified candidates externally (Anthropic docs, Google Gemini API changelog, GitHub release pages); 2 misremembered candidates SKIPPED ("Claude Code Routines" — feature doesn't exist per CHANGELOG search; "GHC June 1 billing transition" — not publicly announced). 1 candidate HELD (Microsoft Foundry brand rename — date too vague, ~Nov 2025 outside research window, no canonical announcement URL reachable).
Single commit 209c073 — 6 entries + infrastructure (3 files):
src/data/updates.ts(6 new entries inserted; 1 existing entry relocated):- 2026-05-11 NEWS — Claude Platform on AWS (Anthropic-hosted Claude, billed through AWS — distinct from Bedrock; full direct-API parity; x-amzn-requestid header). Source: docs.anthropic.com/en/build-with-claude/claude-platform-on-aws.
- 2026-05-07 RELEASE — Gemini 3.1 Flash-Lite GA (1M/64K context; $0.25/$1.50 per MTok; first Gemini 3 family member to reach GA). Source: ai.google.dev/gemini-api/docs/changelog (May 7 entry).
- 2026-05-04 NEWS — Anthropic WIF lands (keyless OIDC auth via POST /v1/oauth/token; Python SDK 0.98.0 added support; sk-ant-oat01-… tokens 60s-24h lifetime; AWS IAM / GCP / GitHub Actions / K8s SA / Entra ID / Okta / SPIFFE / any OIDC issuer; direct Claude API only). Source: docs.anthropic.com/en/manage-claude/workload-identity-federation.
- 2026-04-22 RELEASE — Gemini Embedding 2 GA (first cross-modal embedding model — text/images/video/audio/PDF in one vector space; 8192-token input; flexible 128-3072 output dim). Source: ai.google.dev/gemini-api/docs/models/gemini-embedding-2.
- 2026-04-16 RELEASE — Claude Opus 4.7 ships (model ID
claude-opus-4-7; API breaking changes vs 4.6; cyber safeguards; Cyber Verification Program; same-day Claude in Amazon Bedrock self-serve via/anthropic/v1/messagesacross 27 AWS regions). Source: anthropic.com/news/claude-opus-4-7. - 2026-04-14 DEPRECATED — Claude Sonnet 4 + Opus 4 originals (
claude-sonnet-4-20250514+claude-opus-4-20250514deprecated; API access ends 15 June 2026; Sonnet 4.6 ← Sonnet 4 / Opus 4.7 ← Opus 4; Haiku 3 retired April 20 in same cleanup wave). Source: docs.anthropic.com/en/release-notes/api. -
Chronology fix: existing 2026-04-17 M365 ATK 6.8.0 entry was MISPLACED (sat before existing 2026-05-08 entries — pre-existing bug, not introduced by Session 11). Relocated to correct chronological slot between 2026-04-22 (Embedding 2) and 2026-04-16 (Opus 4.7). Fixed in same commit per duck #2 P0 BLOCKING.
-
src/components/UpdateRow.astro(type widening + CSS for new tags): - Props.tag widened from inline
'NEWS' | 'RELEASE' | 'ADVISORY'literal to importedUpdateTag(full union incl. CVE + DEPRECATED). DEPRECATED entry would have type-errored without this fix per duck #1 P0 BLOCKING. tagClass = tag.toLowerCase()(matches RecentUpdates pattern; eliminates inline branching).-
CSS adds
.update-tag.cve(claw red, matches advisory) +.update-tag.deprecated(ink-mute) — explicit even though deprecated inherits the base color. -
audit-blurbs.ignore.json(4 new sidecar entries for E6 deprecation entry intentional legacy citations): Opus 4at line 135 (title) — context: "Sonnet 4 + Opus 4 originals deprecated"Opus 4at line 137 (meta) — context: "Opus 4.7 replaces Opus 4"Sonnet 4at line 137 (meta) — context: "Sonnet 4.6 replaces Sonnet 4"claude-3-haiku-20240307at line 137 (meta) — context: "claude-3-haiku-20240307" (verbatim)- All 4 are intentional legacy citations in the deprecation announcement entry — the deprecation IS the news, not stale references to replace. Learning: sidecar
contextfield must be a verbatim substring of the line text, NOT a description of the context. First sidecar attempt used descriptive contexts ("Claude 4 originals deprecation meta — migration path explanation") which failed to match any hits; corrected to actual line substrings. The script'scontext guardis strict-substring on the current-line-text.
Rubber-duck dispatched BEFORE implementing (19 findings; all adopted — including 2 P0 BLOCKINGs that prevented broken-state deployment):
- #1 P0 ADOPT (BLOCKING): UpdateRow.astro Props.tag was narrower than the data UpdateTag union — DEPRECATED entry would type-error or render with default styling without explicit handling. Fix: import shared UpdateTag, simplify tagClass, add CSS for .cve + .deprecated.
- #2 P0 ADOPT (BLOCKING): Existing file chronology bug — 2026-04-17 entry sat BEFORE 2026-05-08 entry. If I just inserted new entries without fixing the misplaced one, the chronology stays wrong. Fix in same commit since touching the file anyway.
- #3 P1 PARTIAL ADOPT: 6 entries acceptable but Anthropic-heavy (4/6); kept all 6 since all SOLID. Made E6 deprecation entry very practical ("if you pinned old IDs, change them before June 15").
- #4 P1 ADOPT: WIF as NEWS (not RELEASE) — quietly shipped, no formal release event.
- #6 P1 ADOPT: No markdown links in meta strings (would render as plain text in RSS).
- #7 P1 ADOPT: WIF href is the docs page; Source B verified via SDK CHANGELOG (Python SDK 0.98.0 = 2026-05-04 exact match).
- #9-14 P2 ADOPT: Title polish — dropped redundant "stable GA"; less implementation-heavy WIF title; "Anthropic-hosted" over "Anthropic-operated"; "ships" instead of "breaks Opus 4.6 API contract" (alarm in meta not title); replaced banned "multimodal" with explicit-modality list; "API access ends" over "API retires".
- #15 P2 ADOPT: Same-date subordering = vendor news above Claw entries (2026-05-07 has Gemini Flash-Lite first, then Claw goes live + primitive map).
- #17 P2 ADOPT: Hold Microsoft Foundry rename (date too vague). Skip "Claude Code Routines" (feature doesn't exist).
Content-SME Pass-1 (using docs/sme-templates/sme-pass-1.md — third content-pair use): 4/6 entries fully clean; 2 P1 factual fixes adopted in-commit before Pass-2 dispatch:
- E3 (Claude Platform on AWS): Pass-1 caught "Zero operator access" phrase was borrowed from a DIFFERENT product (Bedrock Mantle April 7 launch language), not from Claude Platform on AWS docs. Removed in-commit; paragraph still reads cleanly.
- E4 (Claude Opus 4.7): Pass-1 caught "Same day, Claude Managed Agents on Bedrock went fully self-serve" was WRONG. What actually went self-serve April 16 was "Claude in Amazon Bedrock" (the Bedrock Mantle /anthropic/v1/messages endpoint), not Claude Managed Agents (which went beta April 8 — separate event). Rewrote to source-accurate wording: "Same day, Claude in Amazon Bedrock went self-serve for all Bedrock customers via the new /anthropic/v1/messages endpoint across 27 AWS regions."
Content-SME Pass-2 (using docs/sme-templates/sme-pass-2.md): 2/4 verified by independent Source B + 2 structural deferrals (Anthropic news + AWS blog URLs all 404). Verified:
- V3 (WIF date 4 May 2026 + Python SDK 0.98.0): VERIFIED via anthropics/anthropic-sdk-python CHANGELOG at tag v0.98.0 — exact date match (2026-05-04) + WIF feature confirmed. (Session 10's "SDK source as Source B" heuristic applied successfully — second confirmation that this is a high-signal pick.)
- V4 (Gemini 3.1 Flash-Lite $0.25/$1.50 + 1M/64K context): VERIFIED via ai.google.dev pricing page + model spec page — all four values confirmed independent of the changelog Pass-1 cited.
- V1 (E3 zero-op-access removal) + V2 (E4 Bedrock self-serve rewrite): Structural deferrals — anthropic.com/news/* URLs (4 attempted) all 404; AWS What's New / AWS Blogs URLs (5 attempted) all 404. No contradiction found in any reachable alternative surface across all src/** files.
- All 3 sweeps clean: no sibling-page contradictions, 0 Pass-1-induced regressions, all content-tree consistency checks pass against direct-model-apis.mdx + connections/models.mdx + comparisons.ts.
Verdict GREEN. 9-batch + 4-tooling + 3-content-session two-pass-SME streak intact.
Lessons captured (apply to Session 12+):
- Sidecar context field is a verbatim substring match on the line text, NOT a description. First attempt used descriptive contexts that didn't match anything; had to correct to actual line substrings. The audit-blurbs script's sidecarSuppresses function (line 613) does a strict substring check — e.context must appear in the hit's current-line-text. Pattern to remember: when writing a new sidecar entry, copy a literal substring from the line containing the hit, not a meta-description of it. Documenting this in the script's header comment would help future sessions; Session 12 candidate: add an example line to the schema comment in audit-blurbs.mjs:27-37 showing the context-must-be-verbatim-substring rule.
- Pre-research before duck = duck critiques a fact-grounded plan. Session 11 dispatched ONE research agent before rubber-ducking to verify the 3 named candidates externally + scout 3-4 additional. Outcome: 2 candidates skipped before they made it to the duck (would have wasted duck cycles on hallucinated topics). Pattern: for content-heavy sessions with multiple candidates, pre-research is the right pre-duck step. Adds ~5-10 min, prevents ducking-on-fiction.
- SDK CHANGELOG as Source B confirmed reusable. Session 10 used OpenAI Python SDK source code as Source B for hostname claim; Session 11 used Anthropic Python SDK CHANGELOG as Source B for ship-date claim. Both reduced "unverifiable from docs" to "verified via SDK at HEAD". This Source-B-as-SDK heuristic is now confirmed on 2 sessions — promote to a default-first-Source-B-pick for any URL/hostname/version/feature-ship-date claim.
- Two P0 BLOCKINGs in one duck pass demonstrates the safety value. Session 11's duck caught both the type-narrowness bug AND the pre-existing chronology bug BEFORE I dispatched any SME or wrote any commit. If I'd shipped E6 without the type widening, the entire DEPRECATED entry would have either type-errored or rendered as default-styled NEWS. If I'd shipped without the chronology fix, the rendered /updates/ page would have had the same bug for another session-cycle.
- Single commit when changes serve one coherent goal (Session 10 lesson refined). Session 11 had 3 files but all served "ship 6 entries (one of which requires rendering + suppression infrastructure)". Coherent goal = single commit. Heuristic: if pulling out any of the 3 files would leave the others in broken state (type-error, suppressed-but-not-cited, etc.), they belong in one commit.
~~Phase 1.1 Session 12 candidates (queued by Session 11)~~ — ✅ DONE 16 May 2026 PM10 (Priority C path — compare/hosted-agent-platforms/ shipped; see Session 12 entry below)¶
Session 12 (Priority C — compare/hosted-agent-platforms/) — ✅ DONE 16 May 2026 PM10¶
Shipped as single commit c4077e7 (9 files · +614/-8 · 1 new MDX, 8 updates). GHA Auto-deploy + Integrity check both green on first push. Live smoke 45/45 + 5 spot-checks HTTP 200 + 11/11 content-render checks PASS + 4/4 cross-link surface checks PASS (/compare/ index has "WHO RUNS YOUR AGENT?" card · /microsoft/ shows §7.5 link + "Microsoft Foundry" rename · /google/ shows "Agent Runtime" + "Gemini Enterprise" renames). 10-batch + 4-tooling + 4-content-session two-pass-SME streak intact.
Scope: §7.5 cross-vendor agent platform compare — Foundry Agent Service · Vertex AI Agents · OpenAI Agents SDK · Copilot Studio. Central asymmetry called out: 3 of 4 are managed runtimes; OpenAI Agents SDK is a framework you host yourself. Page title acknowledges this: "Agent platforms — managed runtimes and open frameworks". 14-row matrix (hosting-model, build-surface, package-release, local-dev, model-surface, tool-catalog, mcp-support, state-sessions, memory, tracing, identity, channels, pricing-model, ga-status) with 21 source URLs. Compare card label: "WHO RUNS YOUR AGENT?".
Single commit c4077e7 — §7.5 + 3 vendor-hub coupling fixes (9 files):
- NEW
src/content/compares/hosted-agent-platforms.mdx(~22 KB): frontmatter (§7.5, status published, vendor cross-vendor, seeAlso [7.3, 7.4]) · "what this is + asymmetry" intro · "Where is Assistants API?" callout (scope-lock per Session 10 pattern) · "Name map — what old names mean now" table · "Microsoft Foundry vs Foundry Agent Service" ASCII shape block · §7.3 vs §7.5 axis-fork explanation · ComparisonMatrix slug · Where each wins (4 platforms × 4 strengths) · Where each lags (4 × 3-4 weaknesses) · Decision sketch ASCII tree · "Most teams end up using two" patterns · "What we are NOT claiming" · "Honest take" first-person · "What to read next" cross-links. src/data/comparisons.ts(+400): new'hosted-agent-platforms'entry with 4 tools, 21 source URLs, 14 rows × 4 cells each.src/data/toolRegistry.ts(+5): 3 new keys (foundry-agent-service,vertex-ai-agents,openai-agents-sdk) + reuse existingcopilot-studio.src/data/coverage.json(compares.current 4→5, lastUpdated bump).src/pages/microsoft/index.astro: blurb "Azure AI Foundry" → "Microsoft Foundry (formerly Azure AI Foundry; URL path still/azure/ai-foundry/)" + foundry product description rename + §7.5 added to compareLinks.src/pages/google/index.astro: blurb "Agent Engine" → "Agent Runtime", "Agentspace" → "Gemini Enterprise" with(formerly ...)qualifiers + product description rename + §7.5 added to compareLinks.src/pages/openai/index.astro: §7.5 added to compareLinks.src/pages/google/[product]/index.astro: Pass-1 P1 fix — vertex-ai-agents product lede + meta description "Agent Engine"/"Agentspace" → "Agent Runtime (formerly Agent Engine)"/"Gemini Enterprise (formerly Agentspace)".src/pages/compare/index.astro:'7.5': 'WHO RUNS YOUR AGENT?'added to angleByNumber.
Rubber-duck dispatched BEFORE implementing (15 findings; all adopted):
- #1 P0 ADOPT: page title "Hosted agent platforms" misleading when OpenAI Agents SDK is in the matrix — renamed to "Agent platforms — managed runtimes and open frameworks".
- #2 P0 ADOPT (CRITICAL): 3 fact-pack vs existing-claw-page contradictions — Copilot Studio MCP prompts (existing wins: Tools+Resources yes, Prompts no), M365 zero-rating scope (existing wins: ALL meters), Vertex Agent Runtime pricing (existing wins: has full $0.0864/vCPU-hr table). Aligned the new matrix to existing pages. The pre-research fact pack had hit a redirect for Vertex pricing and missed that Copilot Studio docs are more specific than the broad billing page.
- #3 P0 ADOPT: brand-chain table renamed "Name map: what old names mean now"; OpenAI lineage rephrased as Assistants/Swarm/Responses API/Agents SDK NOT a clean rename chain.
- #4 P1 ADOPT: visible "Where is Assistants API?" callout near top — scope-lock + 403-blocked-from-static-fetch honest note.
- #5 P1 PARTIAL ADOPT: vendor-hub blurb update PARTIAL — fixed /microsoft/index.astro + /google/index.astro + /google/[product]/index.astro (tight coupling) only; deferred broader Sidebar/MobileDrawer/[product]/explainers Azure-AI-Foundry rename sweep (9+ files) to Session 13 as separate goal per Session 9 cut > defer lesson.
- #6 P1 ADOPT: Vertex column label "Vertex AI Agents" (continuity with existing href); brand map shows umbrella "Gemini Enterprise Agent Platform".
- #7 P1 ADOPT: MCP cell standardised pattern Client: yes · Server: partial/no/yes · Transport: ... · Status: GA/preview.
- #8 P1 ADOPT: "SDK release" row split into "Primary build surface" + "Current package · release cadence"; Copilot Studio = "N/A — SaaS release notes".
- #9 P1 ADOPT: Foundry shape block (ASCII tree showing Microsoft Foundry → Foundry Agent Service).
- #10 P1 ADOPT: explicit §7.3 vs §7.5 fork sentence.
- #11-15 P2 ADOPT: §7.5 confirmed · pricing verbal-except-Vertex (existing-page-quoted-values OK) · seeAlso: ['7.3', '7.4'] narrowed · voice-lint substitution crib sheet pre-staged · no /anthropic/ /openclaw/ hub links (silent omission per §7.4 pattern).
Content-SME Pass-1 (using docs/sme-templates/sme-pass-1.md — fourth content-pair use): 27 claims verified across F1-F7 · V1-V7 · O1-O7 · C1-C6 — 0 P0 · 2 P1 · 9 P2 · 4 deferred. Both P1s adopted in-commit before Pass-2:
- P1 #1 (sibling-page-coupling): src/pages/google/[product]/index.astro:45-46 still used "Agent Engine"/"Agentspace" as current names — directly contradicted the new §7.5 name-map. Fixed with (formerly Agent Engine) / (formerly Agentspace) qualifiers.
- P1 #2 (version attribution): Pass-1 caught the matrix said "Session rewind to before previous invocation added in ADK v1.32.0" but the ADK CHANGELOG shows the v1.32.0 entries are under Bug Fixes ("fix rewind to preserve initial session state") — rewind was pre-existing, v1.32.0 fixed it. Corrected wording.
- 6 P2s adopted in-commit: ADK beta version "2.0.0-beta" → "2.0.0-beta.1" precision · Memory Bank retrieval fee ($0.50/1K) added to memory + pricing rows · fdy-ai-foundry-pricing source URL added for the ACU Pre-Purchase Plan claim (was citing the wrong page) · Foundry MCP transport clarification.
- 3 P2s DEFERRED to Session 13 (Azure AI Foundry → Microsoft Foundry rename sweep, 9+ files: microsoft-foundry-overview.mdx title/section/description, microsoft/[product]/index.astro title map, microsoft-mcp-overview.mdx 3 occurrences, Sidebar.astro, MobileDrawer.astro, plus comparisons.ts §7.3 cells).
- 4 DEFERRED: V5 Vertex pricing URL (current vai-pricing ref points to AutoML page; canonical Agent Runtime pricing URL not findable from static fetch) · C1 "MCP added Aug 2025" verbatim (verified Aug timing from SSE drop date) · C6 PVA→Copilot Studio Nov 2023 rename verbatim · O6 Langfuse/Logfire as named processors.
Content-SME Pass-2 (using docs/sme-templates/sme-pass-2.md — Source B = SDK source/CHANGELOG/PyPI at HEAD): 4/6 Pass-1 fixes verified by independent Source B + 2 structural deferrals (pricing pages all 404). Source B picks proven valuable on third consecutive session:
- P1 #1 verified via ADK source code at HEAD (SHA bd062ec9) — class still named AgentEngineSandboxCodeExecutor AND PyPI v1.33.0 README still says "Vertex AI Agent Engine". The (formerly ...) qualifier validated as actively load-bearing (SDK lags Cloud docs on the rename).
- P1 #2 verified via CHANGELOG.md at HEAD (SHA baa2a92d) — both v1.32.0 rewind entries are under "Bug Fixes" section, not "Features". Independent confirmation.
- P2 #3 verified via PyPI 2.0.0b1 page (pip install google-adk==2.0.0b1).
- P2 #11 verified via Foundry MCP how-to page (different surface from overview Pass-1 used) — MCPTool(server_url="https://...") HTTP throughout, zero SSE mentions.
- C1 verified via Copilot Studio What's New August 2025 entry — explicitly names MCP GA addition.
- O6 verified via SDK source docs/tracing.md SHA 04e121af — Pydantic Logfire + Langfuse explicitly named in "External tracing processors list".
- 2 deferrals: P2 #4 pricing URL (all Agent Runtime pricing URLs 404 from static fetch); P2 #5 ACU Pre-Purchase Plan FAQ (not on learn.microsoft.com; only on azure.microsoft.com pricing page).
Pass-2 caught 3 new P2 internal-consistency bugs before publish (verdict YELLOW → fixed → GREEN):
- P2-A: comparisons.ts:1098 ga-status vertex note still said "2.0.0-beta" — Pass-1's beta-version fix was applied to package-release row but missed the ga-status row. Two cells for same product disagreed.
- P2-B: MDX prose at line 94 ("Where each wins > Google") still said "$0.25 per 1,000 memories per month" — Pass-1's retrieval-fee fix was applied to the matrix note but not the body prose. Now consistent.
- P2-C: comparisons.ts:910 Foundry MCP transport cell said "Streamable HTTP" without "only" — asymmetry with Copilot Studio cell "Streamable HTTP only" implied (wrongly) Foundry might have other transports. Pass-2 Source B confirmed Foundry is HTTP-exclusive. Added "only".
All 3 sweeps clean: 14-row × 4-column matrix internally consistent (no Microsoft Entra/Azure AD drift, no agentic/multimodal/ecosystem drift) · strict-mode invariants hold · zero voice-lint forbidden-word matches across 198 MDX lines · Sush-voice first-person Honest take confirmed.
Verdict GREEN. 10-batch + 4-tooling + 4-content-session two-pass-SME streak intact.
Lessons captured (apply to Session 13+): - Pre-research can be wrong; existing claw pages are often more accurate than fact-pack-from-canonical-docs. The Session 12 duck caught 3 fact-pack-vs-existing-page contradictions where the existing page won every time. Reason: existing pages have been read+verified+SME'd over multiple sessions; new research agents hit redirect pages, JS-rendered values, or wrong URL paths and report "not findable". Pattern: when fact pack says "not findable" but a sibling claw page has the value, trust the sibling claw page. Add explicit "check sibling claw pages" to the pre-research prompt for content-heavy sessions where the topic overlaps existing coverage. - SDK source/CHANGELOG as Source B confirmed reusable on 3rd consecutive session (Sessions 10 → 11 → 12). Promote to canonical default-Source-B-pick for any URL/hostname/version/feature-ship-date/release-attribution claim. The Pass-2 SDK-source verification of P1 #1 ("formerly Agent Engine") was the most useful — it confirmed the qualifier is actively load-bearing because the SDK lags the docs on the rename. - Pass-1 finds the obvious; Pass-2 finds the inter-row-inconsistency. All 3 Pass-2 new P2s were "Pass-1 applied a fix to one row but forgot the same fix in another row". Pattern: Pass-2's "internal consistency sweep across cells" is genuinely valuable; routine even after a clean Pass-1. - Cut > defer when scope creep is real (Session 9 lesson reused). Pass-1 flagged 9+ files with Azure AI Foundry → Microsoft Foundry leftovers — a tempting "fix it all in this commit". Resisted in favor of "tight coupling only" (3 hub files) and split the rest as a Session 13 goal. Goals: "ship §7.5" stayed coherent; rename sweep is a separable coherent goal. - Scope-lock pattern reusable across sessions (Session 10 OpenAI /api/ scope-lock → Session 12 Assistants API scope-lock). The "Where is X?" callout near the top + scope reasoning + honest-about-what-couldn't-be-fetched pattern works on both pages. Codify as a compare-page convention.
Session 13 (Priority A + C parallel) — ✅ DONE 17 May 2026¶
Shipped as 2 split commits: 99a93b4 (content: Foundry rename sweep · 11 files · +27/-27) + f32d1d1 (tooling: audit-blurbs schema comment polish · 1 file · +21). Both pushed in single git push. GHA Auto-deploy + Integrity check both green on first push (~1m18s + 36s). Smoke 45/45 + 6/6 spot-checks HTTP 200 + lede content-render verified live (full rebrand chain + SDK adoption detail). 10-batch + 5-tooling + 5-content-session two-pass-SME streak intact (one of each in S13).
Scope: Two independent priorities completed in parallel. Both shipped clean — zero P0/P1 findings across all 4 SME passes.
Priority A — Azure AI Foundry → Microsoft Foundry rename sweep (commit 99a93b4):
- Pre-research =
grep(not web-fetch). New pattern for rename sweeps: onegrep "Azure AI Foundry" src/inventory gave the 13-file footprint with 24 rename targets + 10 historical-keep occurrences. No web fetch needed — Source A (Microsoft Learn) used at Pass-1 SME, not pre-research. - Rubber-duck adopted 4 recommendations: (1) expand foundry hub lede to preserve full two-step rebrand chain (the duck's biggest catch — a simple-rename would have flattened the rebrand history); (2) smooth
comparisons.ts:415wording from generic "Foundry agents" to "Foundry Agent Service agents" then later refined to "Microsoft Foundry agents (foundry-agent-to-m365)" per Pass-2; (3) accept the### Foundry Agent ServiceH3 (unambiguous in MCP page context since sibling H3 is### Windows AI Foundry — On-device Agent Registry); (4) add post-edit verification grep as a Pass-1 sweep — counted 8 remaining occurrences against the expected historical-keep set. - 24 rename edits across 11 files:
comparisons.ts(3 cells) ·m365-extensibility-paths.mdx·[product]/[slug].astro(title map) ·[product]/index.astro(foundry hub lede + description + mcp lede) ·microsoft/index.astro(mcp slug card + meta description) ·Sidebar.astro(FDY entry + comment) ·MobileDrawer.astro(FDY entry + comment) ·microsoft-agents-toolkit-overview.mdx(foundry-agent-to-m365 template) ·microsoft-declarative-agents-overview.mdx(comparison cell) ·microsoft-foundry-overview.mdx(3 frontmatter fields) ·microsoft-mcp-overview.mdx(4 places incl. H3). - Historical-keep (intentionally not renamed):
updates.ts:116(ATK 6.8.0 changelog quote — historical fact about what shipped) ·direct-model-apis.mdx:43(rebrand context with old-brand in quotes) ·microsoft/index.astro:32, 44(hub "formerly" prose) ·microsoft-foundry-overview.mdx:27, 31×2, 41(rebrand chain narrative + table row). - Pass-1 SME (content-pair, Source A =
learn.microsoft.com/azure/ai-foundry/what-is-azure-ai-foundry+agents/overview): 0 P0 · 0 P1 · 1 P2. The P2 was a stale page-title citation atmicrosoft-foundry-overview.mdx:65— the Learn page is now titled "Microsoft Foundry" (only the URL slugwhat-is-azure-ai-foundrysurvived). Fixed in-commit:("What is Azure AI Foundry")→(thewhat-is-azure-ai-foundrypage; the URL slug kept the old name). - Pass-2 SME (Source B =
azure-ai-projectsSDK GitHub + CHANGELOG at HEAD + ATK template README + Build/Ignite archives): 0 P0 · 0 P1 · 2 P2 + 1 P3 + 1 leftover-reject + 1 deferral. Most important finding: the "Build 2025" event pin in the lede was unsupported by SDK timeline. The CHANGELOG dates the Microsoft Foundry brand adoption toazure-ai-projects 2.0.0b1on 2025-11-11 (Ignite 2025 window), with 1.x through1.0.0(2025-07-31) — six weeks post-Build — still using "AI Foundry". Tightened to "late 2025" + SDK adoption detail in 3 places (lede + rebrand chain narrative + rebrand timeline table). Vocabulary mismatch fix:comparisons.ts:415→ "Microsoft Foundry agents integratable via ATK template (foundry-agent-to-m365)" — matches ATK template README's own wording ("Microsoft Foundry agent") not the abstract product name. - Rejected Pass-2 finding:
updates.ts:116"Azure AI Foundry proxy agent template" Pass-2 wanted renamed → kept as-is. Pass-1's C5 classification stands: ATK CHANGELOG SHA2534f12ships with that brand at 17 April 2026 — retroactively renaming would distort the timestamped historical record. The /updates/ feed is a release log, not a current-state index. - Deferral disclosure:
azure-ai-agentsSDK (the package that exportsMCPTool) still uses "Azure AI Agents Service" branding at HEAD (SHA57cead35) — confirms SDK lag pattern from Session 10-12 sessions. Monitor for SDK catch-up. Not actionable in S13 scope.
Priority C — audit-blurbs schema comment polish (commit f32d1d1):
- Comment-only docstring change. Added a worked CORRECT/WRONG example clarifying that
contextis a case-sensitive verbatim substring of the source line (not a description). The matcher islineText.includes(e.context)at line 637 — case-sensitive, no normalisation. - Origin: Session 11 lesson — descriptive contexts silently fail to suppress because the matcher returns false, but the LEGACY hit fires the CI failure first. Diagnostic catches it at end-of-run, but only after the failure is sealed.
- Tooling Pass-1 SME (Source A = the script's own code): 0 findings. 6 verifications + 3 sweeps all clean. The new comment accurately describes lines 634-641 (matcher), 834-839 (strict-mode exit), and the 6 code-vs-comment consistency locations.
- Tooling Pass-2 SME (Source B = production
audit-blurbs.ignore.jsonsidecar reality check): 0 findings. All 9 production sidecar entries use the ✓ CORRECT pattern — everycontextvalue is a literal verbatim substring of the actual source line. None are description-style or wrong-case. The new example is consistent with — and accurately represents — every real-world sidecar entry. Pass-2's unique axis (which Pass-1 didn't fully exercise) revealed that the schema doc reflects how the sidecar is actually used in practice.
Priority E — vai-pricing URL fix (DEFERRED — needs live browser):
Static fetch confirmed Session 12's finding: no canonical Agent Runtime pricing URL accessible. /products/.../pricing returns AutoML legacy; /agent-runtime/pricing and /scale/runtime/pricing both 404; /scale/runtime/optimize-and-scale has performance numbers (cold start ~4.7s, container_concurrency=9) but not pricing. The Agent Runtime pricing data ($0.0864/vCPU-hr · $0.009/GiB-hr · etc. in comparisons.ts:1071) must come from a dynamically-rendered React page or the Cloud Console pricing calculator. Need live browser to capture canonical URL. Status: blocked across two consecutive sessions.
Priority B — GHC June 1 2026 billing transition (CHECKED, NO EDIT):
docs.github.com/en/copilot/concepts/billing/copilot-requests does NOT mention "GitHub AI Credits", "1 June 2026", or the credits transition as of 17 May 2026. Our microsoft-github-copilot-overview.mdx:55 has a forward-looking specific claim (1 credit = $0.01 USD, 3,000/7,000 promotional bumps). No public source contradicts it but no public source confirms it. Decision: NO edit Session 13. Re-check post-June-1 — likely the GitHub announcement lands week of June 1 → Session 14 Priority A.
Lessons captured (apply to Session 14+):
- Pre-research as grep for rename sweeps (NEW pattern S13). Different from content sessions where pre-research is web-fetch. The "13 files" inventory came from a single command; the classification (24 rename targets vs 10 historical-keep) was clear before duck. Saved ~20 min vs dispatching a research agent.
- Post-edit verification grep as a Pass-1 sweep (NEW S13 duck adoption). After all rename edits, grep the OLD pattern and verify only the expected historical-keep set remains. Cheap routine catch for over-rename (renamed something we shouldn't) and under-rename (missed something we should have). Sush-side cost ~10 seconds; finding cost would be much higher post-deploy.
- Substantive expansion inside a rename sweep can be the right call. Duck flagged that simple-renaming the foundry hub lede would flatten the two-step rebrand chain. Expanded the lede with full chain + dates. Trade-off: scope-creep risk vs internal-consistency win. Won. Pattern reusable: when a rename touches a lede/intro that asserts the rebrand framing, audit that framing — don't just swap the name.
- Comment-only tooling changes still get two-pass SME for streak discipline. Pass-1 tooling can be scoped narrowly to code-vs-comment consistency (no regex coverage trace needed for pure docstring changes). Pass-2's unique axis becomes "production reality check" — read the actual sidecar/config/data file and verify the schema doc matches usage in the wild. Found nothing this session, but the pattern itself is valuable for future schema-evolution sessions.
- Static-fetch-can't-find-the-canonical-URL is itself a deferrable finding (S12+S13 pattern). Don't force a bad URL swap. The current vai-pricing URL → AutoML pricing is "imperfect but at least Google Cloud pricing context". Swapping to optimize-and-scale (which has performance numbers, not pricing) would be a regression. Defer to live browser until Sush has one open.
- Time-stamped historical records (/updates/, changelog quotes) should preserve brand-at-time-of-shipping. Pass-2 wanted to rename updates.ts:116 from "Azure AI Foundry proxy agent template" to "Microsoft Foundry proxy agent template". Rejected: the ATK CHANGELOG (April 17 2026) shipped with the "Azure AI Foundry" brand verbatim. Retroactively renaming would distort the historical record. Pattern: distinguish between "this is what was announced/shipped" (preserve) and "this is what the product currently is" (rename).
- SDK CHANGELOG dates are stronger evidence than marketing event-pinning. Pass-2 caught "Build 2025" as the rebrand event using azure-ai-projects CHANGELOG dates: 2.0.0b1 shipped 2025-11-11 with the new brand; 1.0.0 shipped 2025-07-31 still using old brand. Six weeks AFTER Build 2025. The marketing announcement timing may differ from SDK adoption, but for a developer-facing claw page, the SDK is the cited evidence surface.
- SDK source/CHANGELOG as default Source B — 4th consecutive session as of S13 (S10 → S11 → S12 → S13). Officially canonical pick now. Anyone running Pass-2 on Microsoft / OpenAI / Google / Anthropic claims involving URLs, package versions, brand attribution, or release timing should default to "SDK source + CHANGELOG at HEAD" as Source B unless there's an explicit reason to pick differently.
Session 14 (Priority B) — ✅ DONE 17 May 2026¶
Shipped as single commit 407c4f0 (9 files · +484/-7) — new /compare/mcp-clients/ (§7.6) page + 4 supporting toolRegistry keys + sibling-page seeAlso reciprocals + Pass-2 sibling-page cleanup in microsoft-mcp-overview.mdx. Live at https://claw.aguidetocloud.com/compare/mcp-clients/ since 17 May 2026 ~22:55 UTC. GHA Auto-deploy + Integrity check both green on first push (~3 min). Smoke 45/45 + 5/5 spot-checks HTTP 200 + content-render verified (atrium card · 8 columns · approval-asymmetry callout · Cursor "Not confirmed" cells). 11-batch + 5-tooling + 6-content-session two-pass-SME streak intact (Pass-2 caught 1 regression + 1 sibling leftover — the methodology held; clean-correction record is broken for this session). Compares: 5 → 6 (target 10).
Scope: §7.6 MCP hosts and clients — where you plug an MCP server in.
Eight-column matrix across Claude Desktop · Claude Code · VS Code · Cursor · Copilot Studio · Copilot CLI · Foundry Agent Service · GHC cloud agent. Ten rows: license · acts-as (MCP shape) · client transports · MCP capabilities · remote auth · config location + scope · approval/autonomy · platform availability · GA/preview status · main tripwire.
Decision angle: "Where do I plug my MCP server in?" — distinct from §7.2 (which CLI?) and §7.5 (who hosts the agent loop?). Page's strongest practical insight: the approval/autonomy row — seven of the eight require some form of human approval; the GHC cloud agent calls tools autonomously with no per-call approval. Sourced verbatim from GitHub's own docs.
Pre-research: dispatched a research agent (Source A = canonical vendor docs) before duck. Returned a ~32 KB fact pack with all 8 clients across 11 dimensions including the 10 gotchas worth callout-box treatment. Strong source-grounding throughout; Cursor docs unparseable (JS-rendered) flagged honestly.
Rubber-duck adopted 7 recommendations: (1) keep 8 columns, cut rows to 10 (drop pricing — heterogeneous, overlaps §7.5; combine config + scope into one row); (2) rename "MCP role" → "Acts as" (MCP shape) — host/client have specific spec meanings; lede explicitly says "applications and runtimes that connect to MCP servers"; (3) approval as a dedicated row — don't bury the GHC cloud agent's no-approval design in a gotcha; (4) Cursor stays in matrix with explicit "Not confirmed" cells + Cursor caveat callout; (5) angle = "WHERE TO PLUG IN MCP?"; (6) cc-hint forward-looking rotation (framework approaches · MCP auth patterns · cost calculators); (7) update toolRegistry.ts comment to allow external hrefs only when Claw has no product page yet.
Implementation — 9 files, 484 insertions, 7 deletions:
- NEW: src/content/compares/mcp-clients.mdx (~440 lines: lede + 3 asymmetry callouts + name map + matrix + 8 where-each-wins sections + 8 where-each-lags sections + decision sketch + "Most teams use two" pattern + what-we-are-NOT-claiming + honest take + what-to-read-next)
- src/data/comparisons.ts (+206 lines: mcp-clients slug · 20 source URLs · 10 rows × 8 cells · validator passes)
- src/data/toolRegistry.ts (+4 keys + comment): claude-desktop/vs-code/cursor/ghc-cloud-agent — external canonical-doc URLs for clients without Claw pages (toolRegistry comment now documents this exception)
- src/data/coverage.json: compares.current 5 → 6 + _lastUpdated: 2026-05-17
- src/pages/compare/index.astro: angleByNumber['7.6'] = 'WHERE TO PLUG IN MCP?'
- src/components/CompareCards.astro: cc-hint line forward-looking rotation
- src/content/compares/cli-coding-agents.mdx + hosted-agent-platforms.mdx: reciprocal seeAlso to 7.6
- src/content/explainers/microsoft-mcp-overview.mdx:131: Pass-2 sibling-page cleanup — require_approval="always" framing now correctly says "defaults to" (was "Recommended for write operations")
Pass-1 SME (content pair, Source A = canonical vendor docs): 25 claims verified across 3 sweeps. 1 P0 · 1 P1 · 5 P2 · 1 deferred. All applied:
- P0 — C23 workspace "silently skipped" was WRONG — Anthropic docs say "Claude Code skips it at load time and shows a warning asking you to rename it." Fixed: "Not silent, but easy to overlook in CI or a busy terminal."
- P1 — C18 require_approval="always" framed as "documented pattern" reverses opt-in/opt-out — Foundry docs say it's the DEFAULT. Fixed across 5 occurrences (3 mdx + 2 ts cells).
- P2s: PKCE removed from Claude Code remote-auth (not explicitly named in Anthropic docs); type: "stdio" added as preferred portability alternative to type: "local" for Copilot CLI; chat.mcp.access setting name replaced with "MCP servers in Copilot" org policy (sibling page wording); MCP Apps note added for VS Code capabilities; Copilot Studio plan-scope note (Teams plan = Classic-only, no MCP).
- Deferral: VS Code Windows-unsandboxed claim couldn't be confirmed from fetched v1_101 ranges but 3 sibling Claw pages assert it consistently.
Pass-2 SME (content pair, Source B = MCP spec at modelcontextprotocol.io/specification + azure-ai-agents SDK + GitHub org-policy docs + Claw sibling pages — 5th consecutive default-Source-B-pattern use): 7 corrections reviewed across 3 sweeps. 0 P0 · 0 P1 · 2 P2 (1 regression + 1 sibling leftover) · 1 partial deferral. Both applied:
- R1 regression — "Microsoft-specific extension" label on MCP Apps note was WRONG. Claw's own sourced openai-apps-sdk-widget-rendering.mdx:27 says MCP Apps is "an open standard... ChatGPT was the first host." Fixed: "an open standard not in the core MCP spec, also supported by ChatGPT". Also dropped unsupported "GA Jan 2026" date (sourceRef vsc-v101 is May 2025, doesn't mention MCP Apps).
- V1 tightening — /mcp panel location qualifier for workspace warning is inferred, not stated in Anthropic docs. Removed from 3 places (2 ts cells + mdx).
- Sweep 1 leftover — microsoft-mcp-overview.mdx:131 framed require_approval="always" as "Recommended for write operations" pre-existed; now updated to reflect DEFAULT framing (per S13 Pass-2 internal-consistency pattern).
- Deferral disclosure — MCP Apps GA date + correct VS Code version anchor: devblogs.microsoft.com/visualstudio/mcp-apps/ and variants all 404. Recheck ~30d for working URL.
- Pre-existing PKCE inconsistency in hosted-agent-platforms.mdx:110 noted by Pass-2 but not classified as regression (predates this session). Leave for a future tightening pass.
Lessons captured (apply to Session 15+):
- Hybrid internal/external hrefs in toolRegistry are pragmatic when scope-control matters. When a compare page genuinely needs columns for clients/tools that Claw doesn't yet have product pages for, external canonical-doc URLs in
toolRegistry.tsare honest.integrity-checkonly validates internal links, so external hrefs are safe for the gate. Document the exception in the registry's comment so future maintainers don't see it as accidental drift. S15 candidate work: promote external-href clients (Claude Desktop · VS Code · GHC cloud agent) to actual Claw product pages and swap registry hrefs to internal paths. - "Acts as" / "MCP shape" beats "MCP role" or "MCP client". In the MCP spec, host and client have specific spec meanings. A page comparing 8 hosts/clients/runtimes/services needs taxonomy framing that doesn't pretend they're all the same shape. Pattern: in any spec-grounded cross-vendor matrix, prefer the page's own taxonomy ("Acts as") over the spec's overloaded terms when the column set spans multiple shapes.
- Pre-research before duck is non-optional for big-lift content sessions. S11 lesson reaffirmed. The 32 KB fact pack from a single research dispatch (~7 min) saved an estimated 2+ hours of in-session web fetching. Duck got the fact pack as input, gave a focused critique on framing + scope + row count (not on facts). Order: web-fetch → fact pack → duck → implement.
- Sibling Claw pages catch SME regressions that vendor-source verification can't. Pass-2's R1 (MCP Apps "Microsoft-specific extension") was caught by reading
src/content/explainers/openai-apps-sdk-widget-rendering.mdx:27— not by reading the MCP spec or Microsoft docs. Pattern: Pass-2's "adjacent-page leftover" sweep is genuinely valuable for catching directional errors in editorial framing. Make it a routine final sweep acrosssrc/content/**/*.mdxfor any concept-framing claim in a new compare cell. - Sources-without-confirmable-dates should drop the date, not invent one. R1 included "GA Jan 2026" pulled from a sibling page that's itself derived from devblogs URLs that 404 today. The right move is to keep the fact (MCP Apps exists in VS Code) and drop the unsupported date. Pattern: when the canonical date URL is unreachable, the claim survives without the date.
- "Not confirmed" is the honest cell value when canonical docs are JS-rendered. Cursor's docs are a Next.js SPA with zero parseable text. The matrix has 5 Cursor cells reading
"Not confirmed"with a note pointing at the doc-rendering limitation. Readers get the truth + the why; SME passes get a clean deferral surface. Don't use"?"(opaque) or"See Cursor docs"(evasive). - External docs in tool registry don't break
integrity-check(verified). The audit-claims script wraps internal-only link validation; external hrefs are out of scope by design. Confirmed empirically with 4 external URLs intoolRegistry.tsandintegrity-checkreporting 0 broken / 10889 scanned. - Pass-2 "scope drift" check earns its keep on plan-tier claims. When a feature is available in some plan tiers but not others (Copilot Studio MCP requires standalone subscription; Teams plan is Classic-only), Pass-1 finds the feature claim; Pass-2 verifies the plan-scope. Specifically: "MCP requires the standalone Copilot Studio subscription. The Teams plan is Classic-orchestration only." This is the second consecutive session (S12, S14) where plan-tier scoping needed a Pass-2 catch.
Phase 1.1 Session 15 candidates (queued by Session 14)¶
Open watch items (rolling — date-gated):
- 🔴 GitHub Copilot June 1 2026 billing transition (TIME-CRITICAL once past June 1 — likely Session 15 if today is on/after 2026-06-01) — audit microsoft-github-copilot-overview.mdx:55, 61-76 against docs.github.com/en/copilot/concepts/billing/copilot-requests + github.blog announcement. Forward-looking specific claim in our page (1 credit = $0.01 USD, 3,000/7,000 promotional bumps) likely needs rewrite.
- MCP Apps GA date + VS Code version anchor (NEW S14) — find working canonical Microsoft source for MCP Apps (devblogs/learn). Currently comparisons.ts:1227 vs-code capabilities note describes MCP Apps as "an open standard not in the core MCP spec, also supported by ChatGPT" but has no date or VS Code version pin. ~30-day recheck.
- @github/copilot npm README (NEW S14) — Pass-2 deferral; npmjs.com/package/@github/copilot returned HTTP 403 on the SME run. Re-verify the type value list (local vs stdio semantics) when the npm README becomes fetchable.
- VS Code Windows-unsandboxed canonical citation (NEW S14) — Pass-1 couldn't locate the specific v1_101 section that documents Windows-unsandboxed behavior; sibling Claw pages assert it consistently. Find the canonical VS Code source.
- gpt-4o-audio-preview shutdown + gpt-audio-1.5 (re-verify ~2026-06-15)
- gpt-4-turbo SDK presence (re-verify ~2026-06-15)
- GPT-4o soft-deprecation watch (~2026-06-15)
- Microsoft Foundry brand rename announcement URL (HELD from S11 — still no canonical URL)
- azure-ai-agents SDK brand catch-up (S13 HOLD — still uses "Azure AI Agents Service")
- code-bison/codechat-bison canonical doc citation
- gemma3.5n regex backtrack behaviour
Session 14 deferrals (clean-up scope for Session 15):
- vai-pricing URL fix — needs live browser (deferred across S12 + S13 + S14).
- Foundry ACU Pre-Purchase Plan FAQ spot-check — deferred from S12, still pending.
- PVA → Copilot Studio Nov 2023 rename citation — deferred from S12, still pending.
- hosted-agent-platforms.mdx:110 "authorisation code with PKCE" naming — Pass-2 S14 flagged as pre-existing consistency gap (Microsoft Copilot Studio docs call this mode "Dynamic", not "PKCE"). Tighten when next touching §7.5.
Tooling / docs polish (still pending): - Inline audit-blurbs smoke harness as committed CI artefact — threshold still not met; 6 consecutive sessions (S9-S14) without matcher changes.
Content rolling:
- Backfill 2-3 more /updates/ entries when vendor news appears.
- Phase 2 vendor seeding — Meta/Llama, Mistral, xAI, Perplexity. Each = vendor hub + 4-6 product overviews. Big lift.
- Promote external-href clients to Claw product pages (NEW S14 — natural next move):
- Claude Desktop under /anthropic/claude-desktop/ (Anthropic vendor, has the chat-host story to tell)
- VS Code (with Copilot extension) under /microsoft/vs-code/ — well-warranted given GitHub Copilot's importance; VS Code is the highest-traffic MCP host of the eight
- GHC cloud agent under /microsoft/github-copilot-cloud-agent/ OR as a sub-section of the existing /microsoft/github-copilot/ page — the autonomy-without-approval design is the standout story
- Cursor stays external (third-party, outside 5-vendor universe)
- After each new page lands: swap toolRegistry.ts href from external to internal.
P2 polish (still deferred): - Sonnet 4.5 beta-header technical-precision (S4 Pass-2).
Infrastructure helpers still deferred:
- scripts/sme-validation.mjs (templates remain the right level).
- Extend voice-lint .ts scan to string/template-literal positions only.
- Shared src/lib/model-registry.mjs.
~~Phase 1.1 Session 14 candidates (queued by Session 13)~~ — ✅ DONE 17 May 2026 (see Session 14 entry above)¶
Open watch items (rolling — date-gated):
- 🔴 GitHub Copilot June 1 2026 billing transition (TIME-CRITICAL once past June 1) — audit microsoft-github-copilot-overview.mdx:55, 61-76 against docs.github.com/en/copilot/concepts/billing/copilot-requests + github.blog announcement. As of 17 May 2026, no public source confirms or contradicts the claw page's forward-looking claim. Likely significant rewrite once announcement lands.
- gpt-4o-audio-preview shutdown + gpt-audio-1.5 (re-verify ~2026-06-15)
- gpt-4-turbo SDK presence (re-verify ~2026-06-15)
- GPT-4o soft-deprecation watch (~2026-06-15)
- Microsoft Foundry brand rename announcement URL (HELD from S11 — still no canonical Microsoft announcement post URL)
- code-bison/codechat-bison canonical doc citation
- gemma3.5n regex backtrack behaviour
- azure-ai-agents SDK brand catch-up (NEW S13) — package that exports MCPTool still uses "Azure AI Agents Service" at HEAD (SHA 57cead35). Monitor for rename to "Foundry Agent Service" or "Microsoft Foundry agents" branding. When it lands, update editorial notes in microsoft-mcp-overview.mdx § Foundry Agent Service.
Session 13 deferrals (clean-up scope for Session 14):
- vai-pricing URL fix — needs live browser (deferred across S12 + S13).
- Foundry ACU Pre-Purchase Plan FAQ spot-check — deferred from S12, still pending.
- PVA → Copilot Studio Nov 2023 rename citation — deferred from S12, still pending.
Tooling / docs polish (still pending): - Inline audit-blurbs smoke harness as committed CI artefact — threshold still not met; 5 consecutive sessions (S9-S13) without matcher changes.
Content rolling:
- compare-mcp-clients.mdx (§7.6) — MCP support matrix across 7+ clients. Big-lift (~6-10h with two-pass content SME). Natural companion to §7.5.
- Backfill 2-3 more /updates/ entries when vendor news appears.
- Phase 2 vendor seeding — Meta/Llama, Mistral, xAI, Perplexity. Each = vendor hub + 4-6 product overviews. Big lift; defer until §7.6 lands.
P2 polish (still deferred): - Sonnet 4.5 beta-header technical-precision (S4 Pass-2).
Infrastructure helpers still deferred:
- scripts/sme-validation.mjs (templates remain the right level).
- Extend voice-lint .ts scan to string/template-literal positions only.
- Shared src/lib/model-registry.mjs.
~~Phase 1.1 Session 13 candidates (queued by Session 12)~~ — ✅ DONE 17 May 2026 (see Session 13 entry above)¶
Open watch items (rolling — date-gated):
- GitHub Copilot June 1 2026 billing model transition (TIME-CRITICAL once past June 1) — audit microsoft-github-copilot-overview.mdx lines 61–76 against docs.github.com/.../copilot-requests. Likely significant rewrite. As of 16 May 2026, Session 11 pre-research couldn't find a public announcement yet.
- gpt-4o-audio-preview shutdown + gpt-audio-1.5 (re-verify ~2026-06-15) — re-verify against OpenAI deprecations page + SDK at HEAD. If confirmed: update comparisons.ts:264 Foundry audio cell + audit-blurbs lists.
- gpt-4-turbo SDK presence (re-verify ~2026-06-15) — confirm still in SDK + still classified LEGACY.
- GPT-4o soft-deprecation watch (~2026-06-15) — verify retirement status.
- Microsoft Foundry brand rename (HELD from Session 11) — watch for canonical announcement URL.
- code-bison/codechat-bison canonical doc citation — flag if internal audit trail needs URL.
- gemma3.5n regex backtrack behaviour — if Google releases gemma3.5n, add to KNOWN_CURRENT explicitly.
Tooling / docs polish (queued from Session 11 — still pending):
- audit-blurbs.mjs:27-37 schema comment — add example showing context must be verbatim substring of line text. Low-lift tooling-pair SME. ~30 min.
- Inline audit-blurbs smoke harness as committed CI artefact — Sessions 7+8 used one-off pattern; Sessions 9-12 didn't change matchers. Threshold for promotion to scripts/test-audit-blurbs.mjs hasn't moved.
Session 12 deferrals (clean-up scope for Session 13):
- 🔴 Azure AI Foundry → Microsoft Foundry rename sweep — 9+ files have leftover old brand. Per Pass-1 Sweep 1: microsoft-foundry-overview.mdx:2-5 (title, section, description), src/pages/microsoft/[product]/index.astro:21 (title map), microsoft-mcp-overview.mdx:5,29,115 (3 occurrences), src/components/Sidebar.astro:171, src/components/MobileDrawer.astro:168, plus comparisons.ts:334,374,415 (§7.3 m365-extensibility cells) and microsoft-declarative-agents-overview.mdx:197 ("Azure AI Foundry" in comparison cell). Coherent goal: "make Microsoft Foundry the canonical brand across the site; keep historical mentions in rebrand-chain tables only". Single-commit, ~15 line-edits across 7 files. Eligible for content-pair SME but probably overkill — adjacent-page-coupling grep + verify check:links pass should be enough.
- vai-pricing URL fix — current sourceRef points to AutoML pricing page. Pass-1 + Pass-2 both 404'd on every Agent Runtime pricing URL attempted (.../scale/runtime/pricing, .../scale/pricing, .../pricing). Live-browser session needed to find canonical URL. If still unfindable, swap vai-pricing to point at cloud.google.com/gemini-enterprise-agent-platform/scale/runtime/optimize-and-scale (which has actual Agent Runtime numbers) and add verificationNote flag.
- Foundry ACU Pre-Purchase Plan FAQ spot-check — verify 5/10/15% at 20K/100K/500K ACU tiers from azure.microsoft.com/pricing/details/ai-foundry/ directly in a live browser (Pass-1 verified from text; Pass-2 couldn't independently confirm from learn.microsoft.com).
- PVA → Copilot Studio Nov 2023 rename citation — search microsoft.com/en-us/microsoft-365/blog/ for "Power Virtual Agents renamed" OR the Microsoft Ignite November 2023 announcement archive. Add the canonical URL to the compare/hosted-agent-platforms/ sources if found.
Content rolling:
- compare-mcp-clients.mdx (§7.6) — MCP support matrix across Anthropic Claude Desktop · Claude Code · VS Code · Cursor · Copilot Studio · GitHub Copilot CLI · Foundry Agent Service · cloud agent. Bigger lift (~6-10h with full two-pass content SME). Natural companion to §7.5. Sets up cross-link pattern: §7.5 covers "which platform hosts your agent?" and §7.6 covers "which client speaks MCP to my server?".
- Backfill 2-3 more /updates/ entries when next vendor news appears (rolling — feed is healthy at 17 entries).
- Audit-blurbs content-scope sweep — npm run audit:blurbs:content over src/content/**/*.mdx. Lower priority unless freshness audit wanted.
P2 polish (still deferred):
- Sonnet 4.5 beta-header technical-precision — defer until Anthropic publishes the context-1m-* beta-header name (Session 4 Pass-2).
Infrastructure helpers still deferred:
- scripts/sme-validation.mjs — defer indefinitely; templates are the right level of abstraction.
- Extend voice-lint .ts scan to string/template-literal positions only — implement when noise materialises.
- Shared src/lib/model-registry.mjs — none today.
Phase 1.1 leftovers (rolling — most low-priority now): - LinkedIn post draft about Batch D + E regression-catch story (still queued from Session 1). - §7.10 numeric section sort helper — ✅ DONE Session 9 Track 4. - Header.astro empty-state — ✅ DONE Session 9 Track 2. - Cross-link audit — ✅ DONE Session 9 Track 1.
~~Phase 1.1 Session 4 candidates (queued by Session 3 Pass-2)~~ — ✅ ALL DONE (see Session 4 entry above)¶
Optional Phase 1.1 compare pages (deferred from Batch E)¶
compare-hosted-agent-platforms.mdx— Foundry Agent Service vs Vertex AI Agents vs OpenAI Assistants/Agents SDK vs Copilot Studio runtime.compare-mcp-clients.mdx— MCP support matrix across Anthropic Claude Desktop / Claude Code / VS Code / Cursor / Copilot Studio / GitHub Copilot CLI / Foundry Agent Service / cloud agent.
Phase 2 vendors (later)¶
- Meta/Llama, Mistral, xAI, Perplexity.
Other Phase 1.1 candidates¶
- LinkedIn post draft about Batch D + E regression-catch story (still queued from Session 1).
- Cross-link audit from 5 vendor hubs back to /compare/ (every vendor hub's "Why pick this" should link to a /compare/ page).
- Header.astro: hide the "last updated" meta-item entirely when
lastUpdatedis undefined (voice-duck Session 2 P2 — "last updated —" reads broken on hub pages; alternative to Session 2's em-dash choice). - Decide whether to add
/openai/api/direct-API overview page or accept the Foundry-or-api.openai.comframing in/compare/direct-model-apis/. - §7.10 numeric section sort helper — string
localeComparemisorders double-digit section numbers ("7.10" < "7.2"). Not blocking until 7+ compares exist. - Add 5-6 more /updates/ entries from candidates surfaced by Batch E research (Gemini 3.1 Flash-Lite GA May 7, Claude Code Routines arc, Anthropic Workload Identity Federation, etc.).
What's queued for the next session (UPDATED 15 May 2026 PM2)¶
Phase 1 of v0b is closed. Next session picks up at Phase 1.1 — ongoing rolling content + the cleanup items above.
Suggested scope for first Phase 1.1 session:
- Decide priority: vendor-page consistency fixes (5 items above) vs new compare pages (2 deferred) vs sme-validation.mjs helper script vs Phase 2 vendor expansion.
- The vendor-page fixes are the lowest-effort + highest-cross-page-consistency-value path; recommended first.
Required reading before starting¶
plan.mdin the session workspace — full plan with risk registerlearning-docs/docs/reference/planet-pivot-playbook.md— the meta-playbook (this captures the 6-turn requirements arc + rubber-duck pattern + sourced-seed lifecycle + Cat A/B problem + Batch 0 migration phases)learning-docs/docs/cosmos/claw/playbook.md— the planet's own playbook (the v0b admonition section at the top documents what changed)learning-docs/docs/reference/parallel-git-rules.md— explicit-paths discipline (the URL migration will touch many files; don'tgit add .)~/.copilot/copilot-instructions.mdsections on Deployment Discipline + Stash Discipline
What to ask Sush first (if anything is unclear)¶
- Confirm priority: Batch 0b (URL migration) first, OR Batch A (Anthropic content) first?
- For Batch 0b: Are you happy with
/openclaw/setup/laptop/as the new pattern? (Alternative:/openclaw/section/setup/laptop/to mirror the §-structure visibly.) - For Batch A: Do you want me to focus on Claude Code first (you're using it daily via GitHub Copilot CLI so will upgrade to
triedfastest), or seed everything in one batch?
SQL todo state¶
Expected:
- batch-0-migration → done
- batch-a-anthropic → pending
- batch-b-openai → pending (depends on batch-a)
- batch-c-google → pending (depends on batch-a)
- batch-d-microsoft → pending (depends on batch-a)
- batch-e-compare-updates → pending (depends on all batches a-d)
- phase-1-1-ongoing → pending (depends on batch-e)
Update batch-0-migration description to note "Batch 0b URL migration deferred" if you finish that here, or add a new todo batch-0b-url-migration if you split.
Working pattern reminders¶
Before any git push (per deployment playbook)¶
cd C:\ssClawy\claw-planet
node scripts/voice-lint.mjs
node scripts/integrity-check.mjs
node scripts/audit-claims.mjs
node scripts/audit-verification-states.mjs
node scripts/smoke-check.mjs
npm run build:no-search
All must pass. Then:
git status --short # confirm what's M and what's ??
git add <explicit-paths> # NOT git add . or -A
git commit -m "..." # include Co-authored-by trailer
git pull --rebase origin main # before push
git push origin main
gh run list --repo susanthgit/claw-planet --limit 3 # verify auto-deploy starts
# wait ~90s
# smoke-test live URLs with Invoke-WebRequest
Voice rules (per Sush's voice + Claw scope guardrails)¶
- Plain English. Honest take. Examples · scenarios · comparisons.
- "Why this matters" for important concepts.
- NO marketing voice (voice-lint enforces 16 forbidden words).
- NO out-of-scope topics (prompt engineering · benchmarks · ethics commentary · general AI explainers · vendor PR · news/punditry · customer engagement detail).
- Microsoft entries MUST cite at least one public Microsoft domain (
learn.microsoft.com,github.com/microsoft,devblogs.microsoft.com,azure.microsoft.com,techcommunity.microsoft.com) — voice-lint enforces.
Verification states (new vocabulary)¶
| State | Use when |
|---|---|
planned |
Stub. Not in nav. |
sourced |
Researched from public docs. Sush hasn't personally run it. Most Day-1 entries. |
tried |
Sush (or contributor) ran it. Add verificationContext with tool/version/platform/accountTier/testedBy/testedAt. |
verified |
Tried + considered correct. No surprises, no caveats. |
disputed |
A reader raised a correctness issue. Linked discussion in sources. |
Sush will upgrade entries from sourced → tried over time as he experiments. Don't backfill claims that haven't been tested.
File map quick reference¶
C:\ssClawy\claw-planet\
├── src\
│ ├── components\
│ │ ├── Banner.astro ← Front-matter card (multi-vendor reframed)
│ │ ├── Header.astro ← Brand + search + theme toggle
│ │ ├── VendorTabs.astro ← NEW. Sticky vendor-tabs sub-nav
│ │ ├── VendorHub.astro ← NEW. Reusable hub page (banner + product cards)
│ │ ├── VerificationBanner.astro ← Entry-page contract banner (5 variants)
│ │ ├── Sidebar.astro ← Left TOC (badge fn updated to new vocab)
│ │ └── ...
│ ├── content.config.ts ← Schema: vendor enum + product + verificationContext + 5-state verificationState
│ ├── content\
│ │ ├── setups\ (6 mdx, all tagged vendor: openclaw)
│ │ ├── connections\ (5)
│ │ ├── explainers\ (4)
│ │ ├── plugins\ (1)
│ │ ├── use-cases\ (5)
│ │ ├── gotchas\ (4)
│ │ └── compares\ (1)
│ ├── pages\
│ │ ├── openclaw\index.astro ← NEW. OpenClaw vendor hub
│ │ ├── anthropic\index.astro ← NEW. Anthropic hub (placeholder for Batch A)
│ │ ├── openai\index.astro ← NEW. OpenAI hub (Batch B)
│ │ ├── google\index.astro ← NEW. Google hub (Batch C)
│ │ ├── microsoft\index.astro ← NEW. Microsoft hub (Batch D)
│ │ ├── setup\ (still at /setup/, move to /openclaw/setup/ in Batch 0b)
│ │ ├── overview\ (still at /overview/, move in Batch 0b)
│ │ ├── ...
│ │ └── index.astro ← Home (new tagline)
│ └── layouts\
│ └── BaseLayout.astro ← Now includes <VendorTabs />
├── scripts\
│ ├── voice-lint.mjs ← Extended: OUT-list + Microsoft source-citation
│ ├── audit-verification-states.mjs ← Updated: new 5-state enum
│ ├── migrate-frontmatter-multivendor.mjs ← NEW. Idempotent migration script
│ └── ... (integrity-check, audit-claims, smoke-check, deploy, etc.)
└── public\
└── llms.txt ← Updated: 5-vendor scope, new vocab
See also¶
- Planet Pivot Playbook — meta-playbook capturing the patterns used in v0b
- Claw Playbook — Claw-specific decisions, audience, brand
- Cosmos Philosophy — universal laws
- Session plan.md — full plan with risk register