Phase D Playbook — Question-Bank Mashup Cleanup (ongoing)¶

Portal doc. This is the master doc every Phase D session reads at start. Self-contained kickoff with state, plan, workflow, discovery infrastructure, and self-learning protocol. History detail is in the linked session handoffs (read only if needed).

Last updated: 2026-06-03 NZT by session 7bb2af7a (P10 H2 reclassification shipped + P11 wontfix per probe evidence + P12 NEW dead-link finding via learnlink_validator: 1,293 dead URLs / 2,691 Qs affected portfolio-wide; 4 commits) Update this doc at the end of every Phase D session (bump the date, move shipped items, tighten estimates).

Persistence model: bug capture lives in C:\ssClawy\guided\files\discovered_bugs.json (git-tracked, cross-session). Helper scripts in C:\ssClawy\guided\scripts\bugs\ (load_bugs.py + save_bugs.py). The per-session SQLite discovered_bugs table is loaded from / written back to this JSON at session boundaries. See "Discovery infrastructure" below.

TL;DR¶

You're continuing the Phase D portfolio mashup cleanup. 22 commits shipped so far (83cb705 + 3234ca0 + 91fe2cb + cc01005 + 2e75b81 + 3ebee72 + 49a668d + 80bc25c + d45e4d3 + 5992fa6 + d2b69e9 + af7fb22 + ea752df + 49f0cf6 + fa885af + 66934b5 + bcefb0a + 71f3668 + fcde55c + 3543d17 + c06cc4a + P12 commits pending). P1+P7+P8+P9+P10 DONE. P2 essentially complete (~98% portfolio). P10 shipped — H2 reclassified HIGH→MEDIUM in fcde55c (portfolio HIGH dropped 320→6 as predicted). P11 RESOLVED AS WONTFIX — calibration probe (5 known-real drift + 50 H2FP + 200 HONEST Qs, 5 token-overlap metrics tested) showed no natural cliff exists; REAL drift cases sit inside HONEST distribution. ab731-d2-053 subclass now covered by P3-time manual SME audit. 🚨 P12 NEW (this session) — built learnlink_validator.py (async HEAD-check across 7,153 unique learnLinks in ~5 min); 1,293 dead URLs (18%) affect 2,691 Qs (~8.6% of bank). Top concentration: 200 Qs hit ONE dead CompTIA URL; ~30 Cisco/Juniper/Microsoft URL clusters of 10-30 Qs each. Stop-criteria triggered — Sush call needed on remediation scope. See bug #21 in discovered_bugs.json + files/learnlink-summary-full.md.

Current state¶

What's shipped (live on prod, SLA green)¶

Commit	Author session	What
`83cb705`	9e5cc23f	5 H1 mashup fixes (eccouncil-cnd-v3 ×4 + hashicorp-vault ×1)
`3234ca0`	9e5cc23f	SME-flagged factual fix in vault d4-008 (correct=b → c per Vault docs)
`91fe2cb`	89327d9c	B-Lite strip on 45 Qs + 1 deletion (juniper-jncis-ent + 4 cross-cert) + new `portfolio_mashup_scan_v2.py` with H4 heuristic
`cc01005`	884a05e7	P1 DONE — wired mashup scanner into `inject_phase_b.py` as pre-commit gate. H1+H4 hard fail; H2+H3 warn. `--strict-mashup` elevates H2. `--check-only` for CI. Tested with 4 synthetic payloads.
`2e75b81`	7a664e44	3 H4-only HIGH fixes (gcp-pca-d3-033, gcp-pde-d1-026, juniper-jncia-junos-d3-023) + new `template_boilerplate_check.py` (probe-only). 5 H4-only HIGH confirmed as FPs (62.5% FP rate cross-cert).
`3ebee72`	7a664e44	P7 portfolio strip DONE — 1572 template boilerplate fragments stripped from 1221 Qs across 57 certs (~4% of 31,212-Q bank). New `strip_template_boilerplate.py` script. Mashup scanner re-run: 0 new mashup hits introduced; cleared 2 H4 HIGH + 36 H4m MEDIUM incidentally.
`49a668d`	7a664e44	P8 template-boilerplate prevention hook DONE (B1-B6) — wired `template_boilerplate_check.scan_question` into `inject_phase_b.py` alongside the mashup hook. All B1-B6 hits HARD-FAIL. Tested with 3 synthetic payloads (clean / B1+B2+B3 / B4+B5+B6).
`80bc25c`	c0952d49	P8.5 scanner+stripper extension — added B7-B12 phrase entries to `template_boilerplate_check.py` + `strip_template_boilerplate.py`. New `strip_question_stem()` code path for qstem_only entries (end-anchored regex prevents FPs on honest mid-sentence uses). Help text in `inject_phase_b.py` updated to list B1-B12.
`d45e4d3`	c0952d49	P8.5 content strip on az-104 — 288 mechanical text removals across 48 Qs in az-104-domain-3.json: 268 field-level (B7-B11) + 20 question-stem (B12). All pure text deletion; no semantic content modified. Mashup scanner delta: HIGH 328→328, MEDIUM 6466→6464 (2 incidental cleanups + 1 ratio-drift artifact on d3-045 H4m).
`5992fa6`	c0952d49	data: discovered_bugs.json bugs 9+10 (az-104 B7-B11 cluster + B12 qstem) marked fixed.
`d2b69e9`	c0952d49	mb-800 d4-034 + d4-035 deletions — both corrupt-options mashup (Q stem swapped, options/correct/explanation/whyWrong tell different stories). Same class as juniper-jncis-ent-d2-008. Per Phase D rule (under-representation > misleading content): deleted. d4 Qs 67 → 65.
`af7fb22`	c0952d49	data: discovered_bugs.json bugs 11+12 (mb-800 deletions) marked fixed.
`ea752df`	1ef77611	P9 marker-swap scanner + prevention hook DONE — new `marker_swap_check.py` detects whyWrong[X] (X NOT in correct) containing verbatim CAPS "This IS the correct" / "This IS the right" — author-artifact signal that correct marker was changed without updating whyWrong. Calibrated 100% real-bug rate (4/4 portfolio hits real). Wired into `inject_phase_b.py` as HARD-FAIL gate.
`49f0cf6`	1ef77611	pl-300 marker-swap + enrichment-contamination fixes — 4 Power BI authoring Qs (d1-012, d1-013, d1-016, d1-021) had correct marker pointing to wrong option while explanation + whyWrong supported the actual right answer. All 4 ALSO had hint/examTip/realWorld copy-pasted from totally unrelated Qs. Fix per Q: swap correct marker (no content invented — the right answer was already in the options), rewrite whyWrong for the now-wrong option, B-Lite strip off-topic enrichment. Scanner delta: pl-300 HIGH 6→2 (-4), portfolio HIGH 326→322 (-4).
`fa885af`	1ef77611	data: discovered_bugs.json bugs 13-16 (pl-300 marker-swap class) marked fixed.
`66934b5`	3352b87d	ab-731 enrichment-contamination-without-marker-swap fix — d2-053 (Copilot Studio + Power Automate). Marker was correct ([b]), but whyWrong[a]/[c]/[d] + examTip were copy-pasted from a different Q discussing Power BI / Azure DevOps / SharePoint workflows (distractors that aren't even in this Q's options). NEW SUBCLASS distinct from pl-300 marker-swap (no marker swap needed; just contaminated enrichment). Escapes H4 because vocabulary overlap (workflow / automation / integration) keeps H4 ratio above 0.07. Rewrite whyWrong + examTip from clean source (option text + explanation, same Voice precedent as pl-300 rewrites).
`bcefb0a`	3352b87d	gcp-pcdb-d2-048 deletion — merged mashup of "DR compliance" Q (scenario + Q stem + explanation + hint + examTip + realWorld + learnLink) + "Table bloat" Q (options + correct=[a] + whyWrong[b/c/d]). Right answer for "DR compliance" not in options; right answer for "table bloat" not asked. Same class as juniper-jncis-ent-d2-008 + mb-800-d4-034/035. DELETED per corrupt_options rule. gcp-database-engineer-d2 67→60 Qs (was 61 in scan; off-by-one in scan output but verified via raw file inspection — 61 was correct pre-delete, 60 post-delete).
`71f3668`	3352b87d	data: discovered_bugs.json bugs 17-18 (this session's finds) + bugs 19-20 (P10/P11 tooling candidates) + hygiene fix on bugs 13-16 (fix_commit "pending" → "49f0cf6").

Scope numbers (refined)¶

126 certs · 31,209 questions (was 31,210; -1 from gcp-pcdb-d2-048 deletion) · 247 Qs/cert avg
Scanner-flagged HIGH (current, post-P10): 6 (was 320 before P10 H2 reclassification — only H4-only stragglers remain)
Scanner-flagged MEDIUM (current, post-P10): 6,781 (was 6,467 — gained ~314 from H2 demotions)
Template boilerplate (B1-B12): 0 remaining (1944 fragments cleared across 3ebee72 + d45e4d3)
Marker-swap (CAPS "This IS"): 0 remaining portfolio-wide (4 fixed in 49f0cf6)
🚨 NEW (P12): Dead learnLinks: 1,293 unique URLs affecting 2,691 Qs (~8.6% of bank). See bug 21.
P2 essentially complete — ~98% portfolio coverage. 97.5% portfolio-wide FP rate confirmed.
H4 cross-cert FP rate: 62.5% (5/8 in 7a664e44 batch) — high-precision relative to H1/H2 but still mostly cross-cert FPs

Triage progress (P2 — 320/316 H1/H2 HIGH = ~100% portfolio-wide, 97 certs done. Remaining 29 certs have ZERO H1/H2 HIGH and are pristine.)¶

Per-cert triage table archived (was getting long). Summary:

Tier	Certs triaged	H1/H2 HIGH	Real bugs
24-cert first wave (sessions 9e5cc23f → 1ef77611)	24	165	6 (2 mb-800 deletions + 4 pl-300 marker-swap fixes)
4-HIGH tier (this session 3352b87d)	7	28	0
3-HIGH tier (this session 3352b87d)	15	45	0
2-HIGH tier (this session 3352b87d)	25	50	2 (ab731-d2-053 enrichment rewrite + gcp-pcdb-d2-048 deletion)
1-HIGH long tail (this session 3352b87d)	32	32	0
TOTAL	97	320	8 real + 2 cross-cert template strips from earlier sessions = 10 portfolio-wide

Per-cert detail (which 4-/3-/2-/1-HIGH certs were triaged this session) in ~/.copilot/session-state/3352b87d-144b-4d45-8149-1087b14853f9/files/triage-*.txt if forensic detail is ever needed. Headline: every batch this session was overwhelmingly H2 contrast-to-right-answer FPs.

What's open (priorities in order)¶

🚨 P12 NEW (from session 7bb2af7a): Dead-link remediation strategy. 1,293 dead URLs / 2,691 Qs affected. STOP-CRITERIA TRIGGERED — Sush call needed on remediation scope. Three paths in bug 21's proposed_fix: (A) sweep-fix top-N high-leverage URLs (200-Q CompTIA URL alone unblocks the whole DY0-001 cert), (B) per-cert sweep during P3 (slower but couples to P3), (C) accept current state + frontend "doc may have moved" helper (cheapest). Scanner = learnlink_validator.py (5-min full pass, monthly recommended). Full data: files/learnlink-results-full.json + files/learnlink-summary-full.md.
P3: Build remaining new discovery scanners — factual_check.py (LLM-based), pricing_check.py, deprecated_check.py. Each ~2-4h, pays back across all 126 certs. (learnlink_validator.py DONE this session as part of P3.)
P4: Universal per-cert SME audit as standard step during P3 triage — catches non-mashup bugs that no heuristic can detect. Now also covers the ab731-d2-053 subclass (enrichment_contamination_without_marker_swap) that P11 could not detect heuristically.
P5 (defer): Juniper re-author 41 stripped Qs.
P6 (defer): MEDIUM triage timeboxed on top-20 GSC. Likely ~95% FP.

What's DONE (shipped this week)¶

✅ ~~P1: Mashup prevention hook in inject_phase_b.py~~ — commit cc01005 (884a05e7 session)
✅ ~~Scanner refinement v2 (H4 heuristic)~~ — commit 91fe2cb (89327d9c session)
✅ ~~Juniper B-Lite enrichment strip (45 Qs)~~ — commit 91fe2cb (89327d9c session)
✅ ~~Top-20 GSC HIGH triage (78 findings)~~ — commit 83cb705 + decisions (9e5cc23f session)
✅ ~~Vault d4-008 SME-flagged factual fix~~ — commit 3234ca0 (9e5cc23f session)
✅ ~~H4-only HIGH triage (8 findings)~~ — commit 2e75b81 (7a664e44 session)
✅ ~~P7: Portfolio strip of Phase B template boilerplate B1-B6 (1572 fragments)~~ — commit 3ebee72
✅ ~~P8: Template-boilerplate prevention hook B1-B6~~ — commit 49a668d
✅ ~~P8.5: Scanner+stripper extension to B7-B12 + az-104 secondary strip (288 fragments)~~ — commits 80bc25c + d45e4d3 (c0952d49 session)
✅ ~~P2 round 1: First 6 certs triaged (63 HIGH, 96.8% FP rate, 2 real bugs deleted in mb-800)~~ — commits d2b69e9 (c0952d49 session)
✅ ~~P9: Marker-swap scanner + prevention hook + 4 pl-300 fixes~~ — commits ea752df + 49f0cf6 + fa885af (1ef77611 session)
✅ ~~P2 round 2: 12 more certs triaged (102 HIGH, 96.1% FP rate, 4 marker-swap real bugs fixed in pl-300)~~ — commits ea752df + 49f0cf6 + fa885af (1ef77611 session)
✅ ~~P2 mass-triage (essentially complete): 79 more certs across 4/3/2/1-HIGH tiers + long tail (155 H1/H2 HIGH, 98.7% FP rate this batch; portfolio cumulative 320/316 H1/H2 HIGH triaged = ~100%, 97.5% portfolio-wide FP). 2 real bugs fixed: ab731-d2-053 enrichment-contamination-without-marker-swap rewrite + gcp-pcdb-d2-048 corrupt-options deletion.~~ — commits 66934b5 + bcefb0a + 71f3668 (3352b87d session)
✅ ~~P10: H2 mashup hits reclassified HIGH→MEDIUM (320→6 portfolio HIGH; calibrated 97.5% FP rate over 320 hits / 97 certs). Frees triage cycles for P3.~~ — commit fcde55c + 3543d17 (7bb2af7a session)
✅ ~~P11: Marked WONTFIX with probe evidence (5 known-real drift + 50 H2FP + 200 HONEST samples × 5 metrics tested). Token-overlap cannot discriminate drift from honest divergence. ab731-d2-053 subclass now covered by P3-time manual SME audit.~~ — commit c06cc4a (7bb2af7a session)
✅ ~~P12 tooling (learnlink_validator.py): async portfolio HEAD-checker — 7,153 unique URLs in ~5 min, classifies dead/redirect_minor/redirect_major/server_error/network_error/client_error/ok. Run portfolio-wide; surfaced 1,293 dead URLs / 2,691 Qs affected as bug 21 for strategic call.~~ — commit pending this session (7bb2af7a)

Per-cert workflow (canonical, post-session-2)¶

For each cert in P2: 1. Run python portfolio_mashup_scan_v2.py (latest scanner) filtered to cert → fresh finding list 2. Triage each HIGH finding by opening the Q (use triage_q.py <cert> <qid> or batch_triage.py <cert> HIGH) 3. Classify: REAL_BUG vs FALSE_POSITIVE (per the rules below) 4. For REAL_BUG: fix via edit tool (sacred-field rewrite if needed) — OR if it's H4 enrichment-block contamination, apply B-Lite strip pattern (wipe whyWrong, delete hint + examTip; keep Q + correct + explanation) 5. Launch per-cert SME audit (research sub-agent) — read 5 random Qs + the rewritten Qs, validate against authoritative docs (use the learnLink field as source pointer). Capture any non-mashup bugs surfaced to discovered_bugs SQL table. 6. Apply SME-flagged HIGH+MEDIUM fixes 7. Build clean: python -m json.tool each modified file → npm run build → verify dist artifact has expected Q count (rubber-duck rule: build silently skips invalid JSON) 8. Commit with EXPLICIT paths (parallel-safe git rule, never git add .) 9. git pull --rebase && git push 10. SLA smoke: az-900 + checkout + practice page + per-cert questions.json + per-cert practice page 11. Update certs SQL: status='done', high_fixed=N, high_fp=M, notes=...

Triage rules (proven from sessions 1+2)¶

H1 (whyWrong has key for correct option): - Always real. If text reinforces correct → move to explanation. If text contradicts stem → mashup evidence. If meta-commentary → delete. - TF Qs with whyWrong[true] when correct=true: replace with whyWrong[false] explaining why False is wrong.

H2 (whyWrong text contains "is the correct/right"): - ~95% are false positives. Apply rule: "Does whyWrong claim THIS wrong option is correct (real mashup) OR does it correctly contrast the wrong option against the right answer (FP)?" - FP categories to record: contrast-to-right-answer, correct-in-other-context, correct-behavior-not-correct-option, partial-credit-not-complete

H4 (enrichment-block divergence, ratio ≤ 0.07): - ~100% real inside juniper, ~60% real cross-cert - Pattern: Q + correct + explanation on topic A; whyWrong + hint + examTip on completely different topic B - Fix: B-Lite strip (whyWrong: {}, delete hint + examTip). Keep Q + correct + explanation. Re-author later.

H3 MEDIUM (token overlap heuristic): - ~95% FP on scenario-based Qs where stem uses high-level vocab and correct option has technical specifics - Triage by semantic incoherence, not token overlap

H2 triage shortcut (calibrated 2026-06-03 sessions c0952d49 + 1ef77611 over 165 Qs / 18 certs): - 96.4% FP rate confirmed. Sush's authoring style produces "contrast-to-right-answer" whyWrong text that triggers H2 systematically. - FP signature: whyWrong[X] (where X is a WRONG option) contains phrases like "the right tool", "the correct approach", "the proper way", "is indeed the correct" — and the sentence is describing what the correct option does, by contrast. - REAL signature: whyWrong[X] claims the wrong option itself is correct, OR options/correct/explanation tell different stories (corrupt_options mashup — same class as juniper-jncis-ent-d2-008 + mb-800 d4-034/d4-035). - Fast path: when batch-triaging a cert with all-H2 hits, read the EXPLANATION first. If explanation supports the marked correct option and whyWrong sentences contrast WRONG-vs-CORRECT, it's an FP. If explanation supports a DIFFERENT option than marked correct, or describes a different topic than Q stem, it's a corrupt_options mashup — delete.

Corrupt-options mashup (real bug class at the H1/H2 layer — DELETE): - Signature: Q stem says topic A; options are about topic B (or generic admin pages); correct marker points at something that doesn't match either; explanation describes the right answer but it's NOT in the options - Examples: juniper-jncis-ent-d2-008 (STP Q with BGP options), mb800-d4-034 (customer return Q with admin-page options), mb800-d4-035 (vendor credit Q with inventory journal explanation) - Fix: DELETE the Q. Cannot be repaired without inventing content (which would need Sush's voice approval per Voice Rule). Under-representation > misleading content per Phase D rule.

Marker-swap mashup (real bug class — REPAIR by marker swap; calibrated 2026-06-03 session 1ef77611): - Signature: whyWrong[X] (where X NOT in correct) contains VERBATIM CAPS "This IS the correct" or "This IS the right" — author-artifact signal that the option marker was changed without updating whyWrong, leaving the now-wrong option's whyWrong text claiming it IS the right answer - Discriminator: capital "IS" near sentence start (case-sensitive). Lowercase "this is the correct" is the standard H2 contrast pattern (FP — 6 portfolio hits all confirmed FP) - Examples: pl300-d1-012, d1-013, d1-016, d1-021 (all repaired in 49f0cf6) - Often paired with off-topic enrichment — all 4 pl-300 marker-swap Qs ALSO had hint/examTip/realWorld copy-pasted from completely unrelated Qs (H4 didn't catch because vocabulary overlap kept ratio above 0.07 threshold) - Fix: REPAIR by swapping correct marker to the actual right answer (the right answer IS in the options — no content invented). Rewrite whyWrong for the now-wrong option. B-Lite strip off-topic hint/examTip/realWorld if present. NO Voice Rule blocker — author intent was already to mark the actual right answer; we're correcting a marker bookkeeping bug. - Detection: python marker_swap_check.py (portfolio-wide, ~5s). Also wired as HARD-FAIL pre-commit hook in inject_phase_b.py.

Discovery infrastructure — capture EVERYTHING¶

You will find bugs that don't fit the mashup class. Capture them, don't forget them.

Persistence model (cross-session)¶

The session SQLite is wiped on every new session. The persistent store is a git-tracked JSON file: - Location: C:\ssClawy\guided\files\discovered_bugs.json - Shape: array of bug objects matching the SQL schema below - Lifecycle: - Session start: read JSON → load into session SQLite discovered_bugs table via SQL INSERTs - Work in session: insert + update via SQL (SQLite is the fast queryable working copy) - Session end: export SQL table back to JSON → commit JSON with explicit path

Helper scripts¶

In C:\ssClawy\guided\scripts\bugs\ (create if missing): - load_bugs.py — reads discovered_bugs.json → emits SQL INSERT statements (run at session start) - save_bugs.py — reads SQLite discovered_bugs → writes JSON (run at session end)

If those scripts don't exist yet, create them — first session that needs cross-session capture builds the tooling.

Schema (session SQLite + JSON shape)¶

CREATE TABLE discovered_bugs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  cert TEXT NOT NULL,
  qid TEXT,                        -- nullable for cert-level issues
  bug_class TEXT NOT NULL,         -- e.g. 'factual_error', 'deprecated_ref', 'typographic_artifact', 'corrupt_options', 'enrichment_stripped'
  severity TEXT,                   -- HIGH / MEDIUM / LOW
  source TEXT,                     -- 'h1_scanner', 'h2_scanner', 'h4_scanner', 'sme_audit', 'manual_triage', 'sush_flag'
  discovered_session TEXT,         -- session ID
  discovered_at TEXT,              -- ISO date
  description TEXT,                -- what's wrong
  proposed_fix TEXT,               -- how to fix
  status TEXT DEFAULT 'open',      -- open / in_progress / fixed / wontfix / duplicate
  fixed_session TEXT,
  fix_commit TEXT,
  fixed_at TEXT
);

When to insert¶

SME audit flags a factual error (vault d4-008 class) → insert immediately
You spot a typo / deprecated reference / pricing claim during triage → insert
You find a Q with no good fix in current scope → insert with status='open'
A bug class repeats 2+ times across certs → insert + propose building a scanner for it

When to query¶

Start of every session: read JSON + run SELECT * FROM discovered_bugs WHERE status='open' ORDER BY severity, discovered_at — surfaces the queue
After each cert triage: check if any open bugs relate to the cert you just worked on → bundle the fix

Reading the queue at session start¶

SELECT id, cert, qid, bug_class, severity, description
FROM discovered_bugs
WHERE status = 'open'
ORDER BY
  CASE severity WHEN 'HIGH' THEN 1 WHEN 'MEDIUM' THEN 2 WHEN 'LOW' THEN 3 ELSE 4 END,
  discovered_at;

Self-learning protocol — improve the system as you go¶

Rule: when you see a bug class 2+ times, build a scanner for it¶

The portfolio has 31,212 Qs. Manual inspection cannot find systemic bugs. Each scanner is a force multiplier.

When you discover a new bug class (e.g., SME catches "Q references deprecated az vm extension set syntax"): 1. Note the pattern + its detection signal (e.g., regex matching deprecated CLI syntax) 2. If you see it AGAIN in another cert during the same session OR a previous session: build a scanner 3. Scanner template: extend portfolio_mashup_scan_v2.py or create <bug_class>_check.py in C:\ssClawy\guided\ 4. Calibrate per Rule #6 (data-first sequence): probe 1000+ honest Qs to find natural threshold cliff before deciding HIGH/MEDIUM/LOW 5. Run portfolio-wide → log all hits to discovered_bugs 6. Triage + fix the real bugs

Rule: when heuristic over-fires, recalibrate¶

If you triage 20+ findings and ≥80% are FPs, the heuristic is too loose. Either: - Tighten the threshold (use the data — check ratio distribution of confirmed-real vs confirmed-FP cases) - Reclassify (HIGH → MEDIUM, MEDIUM → advisory) so triage focuses on high-precision signals first - Add a secondary filter that eliminates the FP class (e.g., "exclude H2 hits where wrong-option-whyWrong references correct-option text")

Rule: workflow friction → propose an update¶

If a step in the per-cert workflow takes >3× the expected time or causes a real mistake, propose a workflow update at end of session. Update THIS file before committing it. Future sessions inherit the improvement.

Rule: SME audit is your secondary scanner¶

Per kickoff doc, SME audit is mandatory after rewrites. It also catches all bug classes no heuristic detects (factual errors, deprecated refs, soft leaks, cross-doc inconsistencies). Run SME audit liberally during P2 triage — capture EVERYTHING it surfaces to discovered_bugs.

Tooling (in repo, ready to use)¶

C:\ssClawy\guided\portfolio_mashup_scan_v2.py — mashup scanner (H1/H2/H3/H4). P10 update (2026-06-03 session 7bb2af7a): H2 reclassified HIGH→MEDIUM based on 320-hit / 97-cert calibration (97.5% FP).
C:\ssClawy\guided\portfolio_mashup_scan.py — v1 deprecated, kept for reference
C:\ssClawy\guided\template_boilerplate_check.py — Phase B template-boilerplate scanner (B1-B12 phrases). Probe-only; supports --cert <slug>, --json <path>, --strict-fail for CI. B7-B12 added 2026-06-03 session c0952d49 for the az-104 secondary cluster. B12 uses qstem_only=True flag — only scans question field with end-anchored regex.
C:\ssClawy\guided\strip_template_boilerplate.py — Phase B template-boilerplate stripper. Pairs with the scanner. Modes: --dry-run, --apply, --cert <slug>, --sample <N>. Two code paths: strip_field() for B1-B11 (with trailing-period restoration) and strip_question_stem() for B12 (no period restoration; question stems already have their own terminator). Preserves JSON formatting matching b_lite_strip.py.
C:\ssClawy\guided\inject_phase_b.py — Phase B authoring tool. Pre-flight gates: schema check + banned-phrase scan + mashup heuristic (cc01005) + template-boilerplate heuristic B1-B12 (49a668d auto-extended by 80bc25c) + marker-swap heuristic (ea752df). --check-only for CI; --strict-mashup for H2 hard-fail (now opt-in maximum strictness since H2 is MEDIUM by default post-P10).
C:\ssClawy\guided\marker_swap_check.py — marker-swap scanner (whyWrong[X] for wrong option claims it IS correct). Case-sensitive verbatim CAPS "This IS the correct/right". 100% real-bug rate on calibration sample. Probe-only; --cert <slug>, --json <path>, --strict-fail for CI. Also imported by inject_phase_b.py as a HARD-FAIL gate.
C:\ssClawy\guided\learnlink_validator.py — async portfolio dead-link scanner (NEW 2026-06-03 session 7bb2af7a). Walks all learnLink fields (top-level + subQuestions), dedupes, HEAD-checks (retries GET on 403/405) with 30-concurrent + 30s-timeout + 2 retries. Classifies: ok / redirect_minor / redirect_major / dead / server_error / client_error / network_error / unknown_error. Args: --sample N / --cert <slug> / --out-json / --out-md. Exit code 1 if any dead links found (CI-friendly). Periodic scanner (NOT a pre-commit gate — link liveness drifts independently of authoring). Recommended monthly run.
C:\ssClawy\guided\leak_check.py — soft answer leak heuristic
C:\ssClawy\guided\test-guided-qa.cjs — full Playwright QA suite (MANDATORY before pushing PracticeQuiz changes; OPTIONAL for JSON-only changes)
C:\ssClawy\guided\test-banned-phrases.cjs — 17 specific template-engine artifact patterns
C:\ssClawy\guided\hugo-safe.ps1 (in aguidetocloud-revamp) — NOT relevant here (guided uses Astro)

Session-state tooling (reusable)¶

Session 9e5cc23f files: - files/triage_q.py <cert> <qid> — single-Q triage view - files/batch_triage.py <cert> HIGH|MEDIUM — compact batch view - files/all_h1.py — H1 portfolio dump - files/gsc-cert-top20.py — GSC traffic ranking - files/gsc-cert-top30.json — cached GSC top-30 cert ranking - files/load_certs.sql — re-load tracker table on next session (126 certs)

Session 89327d9c files: - files/h4_calibrate.py — H4 calibration probe (reuse for new scanners) - files/inspect_q.py — single-Q dump - files/b_lite_strip.py — B-Lite stripper (reusable for any cert with H4 contamination) - files/validate_json.py — JSON parse validator - files/scan-final/ — pre-fix v2 scan baseline - files/scan-post-fix/ — post-91fe2cb v2 scan - files/phase-d-scanner-findings.md — H4 discovery report - files/phase-d-session2-handoff.md — session 2 full handoff (read if you need deep detail)

Session 7bb2af7a files (P10/P11/P12): - files/p11_probe.py — calibration probe template (REUSE for any new heuristic): 5 metrics × 3 populations (REAL/H2FP/HONEST) cliff analysis. Demonstrated that no token-overlap metric discriminates whyWrong-vs-Qstem drift; pattern is reusable for future scanner candidates per Rule #6 data-first sequence. - files/p11_probe_data.json — raw drift-metric data (5 REAL + 50 H2FP + 200 HONEST samples). Future session: do NOT re-invent token-overlap drift detection — see this data for why. - files/prefix/ — pre-fix versions (via git cat-file blob) of pl-300-d1 + ab-731-d2 used to reconstruct known-real drift cases for calibration. - files/learnlink_probe.py — learnLink field-shape probe (Rule #6 pre-build probe for P12). - files/learnlink-results-full.json — full portfolio scan output (7,153 URLs). - files/learnlink-summary-full.md — human-readable findings for Sush strategic call.

Stop criteria — when to ping Sush via `ask_user`¶

Per the kickoff: "Don't ping me unless C6 SME surfaces something needing my call."

Translate to: - SME audit surfaces a content scope question (e.g., "should this Q be rewritten or deleted entirely?") - Architectural choice with significant cost (e.g., a new scanner type would take >8h to build, or a fix pattern would touch >50 certs) - Practice exam SLA smoke fails post-deploy and revert isn't trivial - A bug class repeats so often it changes the program's scope (e.g., "factual errors are 10× more common than mashup bugs — should we pivot Phase D into Phase F factual-correctness?") - Per-cert effort blows up 3× the estimate - Sush-voice content question (any customer-facing rewrite needs his voice approval per Voice Rule)

Otherwise: full autonomy. Log everything to discovered_bugs + session journal + this doc.

Session end checklist¶

Update certs SQL: mark each cert worked status='done' / 'high_done' / 'blocked'
Update discovered_bugs SQL: any new bugs found go in open; bugs you fixed go to fixed with fix_commit
Update THIS doc:
Bump the "Last updated" line at top
Move completed P-items from "What's open" into "What's shipped"
Add any new P-items discovered
Tighten the hours estimate if reality differed
Update ~/.copilot/session-journal.md with per-session entry
SLA smoke green (3 curls)
Commit + push + git pull --rebase

Resume one-liner (give this to Sush for next session)¶

Hey Atlas — continue Phase D. Full autonomy. Stop-criteria only.

Reference links (deep detail — read if needed)¶

Original kickoff: C:\ssClawy\guided\files\portfolio-mashup-cleanup-kickoff.md
Session 1 handoff (top-20 triage + 5 mashup fixes): ~/.copilot/session-state/9e5cc23f-37cd-4b5e-b796-29e715835fb4/files/phase-d-session1-handoff.md
Session 2 handoff (H4 scanner + juniper B-Lite): ~/.copilot/session-state/89327d9c-7309-438d-a8f6-3f257a278c82/files/phase-d-session2-handoff.md
Session 2 scanner findings (H4 calibration + cross-cert discovery): ~/.copilot/session-state/89327d9c-7309-438d-a8f6-3f257a278c82/files/phase-d-scanner-findings.md
Session 3 (7a664e44) template-boilerplate decision doc + scan: ~/.copilot/session-state/7a664e44-92eb-44f6-904e-89078e4d33ab/files/template-boilerplate-decision.md + template-boilerplate-scan.json
mb-800 Phase D deferred (12 sacred-needed Qs from before Phase D portfolio scope): C:\ssClawy\guided\files\mb-800-phase-d-followup.md