Phase D Playbook — Question-Bank Mashup Cleanup (ongoing)¶
Portal doc. This is the master doc every Phase D session reads at start. Self-contained kickoff with state, plan, workflow, discovery infrastructure, and self-learning protocol. History detail is in the linked session handoffs (read only if needed).
Last updated: 2026-06-03 NZT by session 7bb2af7a (P10 H2 reclassification shipped + P11 wontfix per probe evidence + P12 NEW dead-link finding via learnlink_validator: 1,293 dead URLs / 2,691 Qs affected portfolio-wide; 4 commits) Update this doc at the end of every Phase D session (bump the date, move shipped items, tighten estimates).
Persistence model: bug capture lives in
C:\ssClawy\guided\files\discovered_bugs.json(git-tracked, cross-session). Helper scripts inC:\ssClawy\guided\scripts\bugs\(load_bugs.py+save_bugs.py). The per-session SQLitediscovered_bugstable is loaded from / written back to this JSON at session boundaries. See "Discovery infrastructure" below.
TL;DR¶
You're continuing the Phase D portfolio mashup cleanup. 22 commits shipped so far (83cb705 + 3234ca0 + 91fe2cb + cc01005 + 2e75b81 + 3ebee72 + 49a668d + 80bc25c + d45e4d3 + 5992fa6 + d2b69e9 + af7fb22 + ea752df + 49f0cf6 + fa885af + 66934b5 + bcefb0a + 71f3668 + fcde55c + 3543d17 + c06cc4a + P12 commits pending). P1+P7+P8+P9+P10 DONE. P2 essentially complete (~98% portfolio). P10 shipped — H2 reclassified HIGH→MEDIUM in fcde55c (portfolio HIGH dropped 320→6 as predicted). P11 RESOLVED AS WONTFIX — calibration probe (5 known-real drift + 50 H2FP + 200 HONEST Qs, 5 token-overlap metrics tested) showed no natural cliff exists; REAL drift cases sit inside HONEST distribution. ab731-d2-053 subclass now covered by P3-time manual SME audit. 🚨 P12 NEW (this session) — built learnlink_validator.py (async HEAD-check across 7,153 unique learnLinks in ~5 min); 1,293 dead URLs (18%) affect 2,691 Qs (~8.6% of bank). Top concentration: 200 Qs hit ONE dead CompTIA URL; ~30 Cisco/Juniper/Microsoft URL clusters of 10-30 Qs each. Stop-criteria triggered — Sush call needed on remediation scope. See bug #21 in discovered_bugs.json + files/learnlink-summary-full.md.
Current state¶
What's shipped (live on prod, SLA green)¶
| Commit | Author session | What |
|---|---|---|
83cb705 |
9e5cc23f | 5 H1 mashup fixes (eccouncil-cnd-v3 ×4 + hashicorp-vault ×1) |
3234ca0 |
9e5cc23f | SME-flagged factual fix in vault d4-008 (correct=b → c per Vault docs) |
91fe2cb |
89327d9c | B-Lite strip on 45 Qs + 1 deletion (juniper-jncis-ent + 4 cross-cert) + new portfolio_mashup_scan_v2.py with H4 heuristic |
cc01005 |
884a05e7 | P1 DONE — wired mashup scanner into inject_phase_b.py as pre-commit gate. H1+H4 hard fail; H2+H3 warn. --strict-mashup elevates H2. --check-only for CI. Tested with 4 synthetic payloads. |
2e75b81 |
7a664e44 | 3 H4-only HIGH fixes (gcp-pca-d3-033, gcp-pde-d1-026, juniper-jncia-junos-d3-023) + new template_boilerplate_check.py (probe-only). 5 H4-only HIGH confirmed as FPs (62.5% FP rate cross-cert). |
3ebee72 |
7a664e44 | P7 portfolio strip DONE — 1572 template boilerplate fragments stripped from 1221 Qs across 57 certs (~4% of 31,212-Q bank). New strip_template_boilerplate.py script. Mashup scanner re-run: 0 new mashup hits introduced; cleared 2 H4 HIGH + 36 H4m MEDIUM incidentally. |
49a668d |
7a664e44 | P8 template-boilerplate prevention hook DONE (B1-B6) — wired template_boilerplate_check.scan_question into inject_phase_b.py alongside the mashup hook. All B1-B6 hits HARD-FAIL. Tested with 3 synthetic payloads (clean / B1+B2+B3 / B4+B5+B6). |
80bc25c |
c0952d49 | P8.5 scanner+stripper extension — added B7-B12 phrase entries to template_boilerplate_check.py + strip_template_boilerplate.py. New strip_question_stem() code path for qstem_only entries (end-anchored regex prevents FPs on honest mid-sentence uses). Help text in inject_phase_b.py updated to list B1-B12. |
d45e4d3 |
c0952d49 | P8.5 content strip on az-104 — 288 mechanical text removals across 48 Qs in az-104-domain-3.json: 268 field-level (B7-B11) + 20 question-stem (B12). All pure text deletion; no semantic content modified. Mashup scanner delta: HIGH 328→328, MEDIUM 6466→6464 (2 incidental cleanups + 1 ratio-drift artifact on d3-045 H4m). |
5992fa6 |
c0952d49 | data: discovered_bugs.json bugs 9+10 (az-104 B7-B11 cluster + B12 qstem) marked fixed. |
d2b69e9 |
c0952d49 | mb-800 d4-034 + d4-035 deletions — both corrupt-options mashup (Q stem swapped, options/correct/explanation/whyWrong tell different stories). Same class as juniper-jncis-ent-d2-008. Per Phase D rule (under-representation > misleading content): deleted. d4 Qs 67 → 65. |
af7fb22 |
c0952d49 | data: discovered_bugs.json bugs 11+12 (mb-800 deletions) marked fixed. |
ea752df |
1ef77611 | P9 marker-swap scanner + prevention hook DONE — new marker_swap_check.py detects whyWrong[X] (X NOT in correct) containing verbatim CAPS "This IS the correct" / "This IS the right" — author-artifact signal that correct marker was changed without updating whyWrong. Calibrated 100% real-bug rate (4/4 portfolio hits real). Wired into inject_phase_b.py as HARD-FAIL gate. |
49f0cf6 |
1ef77611 | pl-300 marker-swap + enrichment-contamination fixes — 4 Power BI authoring Qs (d1-012, d1-013, d1-016, d1-021) had correct marker pointing to wrong option while explanation + whyWrong supported the actual right answer. All 4 ALSO had hint/examTip/realWorld copy-pasted from totally unrelated Qs. Fix per Q: swap correct marker (no content invented — the right answer was already in the options), rewrite whyWrong for the now-wrong option, B-Lite strip off-topic enrichment. Scanner delta: pl-300 HIGH 6→2 (-4), portfolio HIGH 326→322 (-4). |
fa885af |
1ef77611 | data: discovered_bugs.json bugs 13-16 (pl-300 marker-swap class) marked fixed. |
66934b5 |
3352b87d | ab-731 enrichment-contamination-without-marker-swap fix — d2-053 (Copilot Studio + Power Automate). Marker was correct ([b]), but whyWrong[a]/[c]/[d] + examTip were copy-pasted from a different Q discussing Power BI / Azure DevOps / SharePoint workflows (distractors that aren't even in this Q's options). NEW SUBCLASS distinct from pl-300 marker-swap (no marker swap needed; just contaminated enrichment). Escapes H4 because vocabulary overlap (workflow / automation / integration) keeps H4 ratio above 0.07. Rewrite whyWrong + examTip from clean source (option text + explanation, same Voice precedent as pl-300 rewrites). |
bcefb0a |
3352b87d | gcp-pcdb-d2-048 deletion — merged mashup of "DR compliance" Q (scenario + Q stem + explanation + hint + examTip + realWorld + learnLink) + "Table bloat" Q (options + correct=[a] + whyWrong[b/c/d]). Right answer for "DR compliance" not in options; right answer for "table bloat" not asked. Same class as juniper-jncis-ent-d2-008 + mb-800-d4-034/035. DELETED per corrupt_options rule. gcp-database-engineer-d2 67→60 Qs (was 61 in scan; off-by-one in scan output but verified via raw file inspection — 61 was correct pre-delete, 60 post-delete). |
71f3668 |
3352b87d | data: discovered_bugs.json bugs 17-18 (this session's finds) + bugs 19-20 (P10/P11 tooling candidates) + hygiene fix on bugs 13-16 (fix_commit "pending" → "49f0cf6"). |
Scope numbers (refined)¶
- 126 certs · 31,209 questions (was 31,210; -1 from gcp-pcdb-d2-048 deletion) · 247 Qs/cert avg
- Scanner-flagged HIGH (current, post-P10): 6 (was 320 before P10 H2 reclassification — only H4-only stragglers remain)
- Scanner-flagged MEDIUM (current, post-P10): 6,781 (was 6,467 — gained ~314 from H2 demotions)
- Template boilerplate (B1-B12): 0 remaining (1944 fragments cleared across 3ebee72 + d45e4d3)
- Marker-swap (CAPS "This IS"): 0 remaining portfolio-wide (4 fixed in
49f0cf6) - 🚨 NEW (P12): Dead learnLinks: 1,293 unique URLs affecting 2,691 Qs (~8.6% of bank). See bug 21.
- P2 essentially complete — ~98% portfolio coverage. 97.5% portfolio-wide FP rate confirmed.
- H4 cross-cert FP rate: 62.5% (5/8 in 7a664e44 batch) — high-precision relative to H1/H2 but still mostly cross-cert FPs
Triage progress (P2 — 320/316 H1/H2 HIGH = ~100% portfolio-wide, 97 certs done. Remaining 29 certs have ZERO H1/H2 HIGH and are pristine.)¶
Per-cert triage table archived (was getting long). Summary:
| Tier | Certs triaged | H1/H2 HIGH | Real bugs |
|---|---|---|---|
| 24-cert first wave (sessions 9e5cc23f → 1ef77611) | 24 | 165 | 6 (2 mb-800 deletions + 4 pl-300 marker-swap fixes) |
| 4-HIGH tier (this session 3352b87d) | 7 | 28 | 0 |
| 3-HIGH tier (this session 3352b87d) | 15 | 45 | 0 |
| 2-HIGH tier (this session 3352b87d) | 25 | 50 | 2 (ab731-d2-053 enrichment rewrite + gcp-pcdb-d2-048 deletion) |
| 1-HIGH long tail (this session 3352b87d) | 32 | 32 | 0 |
| TOTAL | 97 | 320 | 8 real + 2 cross-cert template strips from earlier sessions = 10 portfolio-wide |
Per-cert detail (which 4-/3-/2-/1-HIGH certs were triaged this session) in ~/.copilot/session-state/3352b87d-144b-4d45-8149-1087b14853f9/files/triage-*.txt if forensic detail is ever needed. Headline: every batch this session was overwhelmingly H2 contrast-to-right-answer FPs.
What's open (priorities in order)¶
- 🚨 P12 NEW (from session 7bb2af7a): Dead-link remediation strategy. 1,293 dead URLs / 2,691 Qs affected. STOP-CRITERIA TRIGGERED — Sush call needed on remediation scope. Three paths in bug 21's
proposed_fix: (A) sweep-fix top-N high-leverage URLs (200-Q CompTIA URL alone unblocks the whole DY0-001 cert), (B) per-cert sweep during P3 (slower but couples to P3), (C) accept current state + frontend "doc may have moved" helper (cheapest). Scanner =learnlink_validator.py(5-min full pass, monthly recommended). Full data:files/learnlink-results-full.json+files/learnlink-summary-full.md. - P3: Build remaining new discovery scanners —
factual_check.py(LLM-based),pricing_check.py,deprecated_check.py. Each ~2-4h, pays back across all 126 certs. (learnlink_validator.pyDONE this session as part of P3.) - P4: Universal per-cert SME audit as standard step during P3 triage — catches non-mashup bugs that no heuristic can detect. Now also covers the ab731-d2-053 subclass (enrichment_contamination_without_marker_swap) that P11 could not detect heuristically.
- P5 (defer): Juniper re-author 41 stripped Qs.
- P6 (defer): MEDIUM triage timeboxed on top-20 GSC. Likely ~95% FP.
What's DONE (shipped this week)¶
- ✅ ~~P1: Mashup prevention hook in
inject_phase_b.py~~ — commitcc01005(884a05e7 session) - ✅ ~~Scanner refinement v2 (H4 heuristic)~~ — commit
91fe2cb(89327d9c session) - ✅ ~~Juniper B-Lite enrichment strip (45 Qs)~~ — commit
91fe2cb(89327d9c session) - ✅ ~~Top-20 GSC HIGH triage (78 findings)~~ — commit
83cb705+ decisions (9e5cc23f session) - ✅ ~~Vault d4-008 SME-flagged factual fix~~ — commit
3234ca0(9e5cc23f session) - ✅ ~~H4-only HIGH triage (8 findings)~~ — commit
2e75b81(7a664e44 session) - ✅ ~~P7: Portfolio strip of Phase B template boilerplate B1-B6 (1572 fragments)~~ — commit
3ebee72 - ✅ ~~P8: Template-boilerplate prevention hook B1-B6~~ — commit
49a668d - ✅ ~~P8.5: Scanner+stripper extension to B7-B12 + az-104 secondary strip (288 fragments)~~ — commits
80bc25c+d45e4d3(c0952d49 session) - ✅ ~~P2 round 1: First 6 certs triaged (63 HIGH, 96.8% FP rate, 2 real bugs deleted in mb-800)~~ — commits
d2b69e9(c0952d49 session) - ✅ ~~P9: Marker-swap scanner + prevention hook + 4 pl-300 fixes~~ — commits
ea752df+49f0cf6+fa885af(1ef77611 session) - ✅ ~~P2 round 2: 12 more certs triaged (102 HIGH, 96.1% FP rate, 4 marker-swap real bugs fixed in pl-300)~~ — commits
ea752df+49f0cf6+fa885af(1ef77611 session) - ✅ ~~P2 mass-triage (essentially complete): 79 more certs across 4/3/2/1-HIGH tiers + long tail (155 H1/H2 HIGH, 98.7% FP rate this batch; portfolio cumulative 320/316 H1/H2 HIGH triaged = ~100%, 97.5% portfolio-wide FP). 2 real bugs fixed: ab731-d2-053 enrichment-contamination-without-marker-swap rewrite + gcp-pcdb-d2-048 corrupt-options deletion.~~ — commits
66934b5+bcefb0a+71f3668(3352b87d session) - ✅ ~~P10: H2 mashup hits reclassified HIGH→MEDIUM (320→6 portfolio HIGH; calibrated 97.5% FP rate over 320 hits / 97 certs). Frees triage cycles for P3.~~ — commit
fcde55c+3543d17(7bb2af7a session) - ✅ ~~P11: Marked WONTFIX with probe evidence (5 known-real drift + 50 H2FP + 200 HONEST samples × 5 metrics tested). Token-overlap cannot discriminate drift from honest divergence. ab731-d2-053 subclass now covered by P3-time manual SME audit.~~ — commit
c06cc4a(7bb2af7a session) - ✅ ~~P12 tooling (
learnlink_validator.py): async portfolio HEAD-checker — 7,153 unique URLs in ~5 min, classifies dead/redirect_minor/redirect_major/server_error/network_error/client_error/ok. Run portfolio-wide; surfaced 1,293 dead URLs / 2,691 Qs affected as bug 21 for strategic call.~~ — commit pending this session (7bb2af7a)
Per-cert workflow (canonical, post-session-2)¶
For each cert in P2:
1. Run python portfolio_mashup_scan_v2.py (latest scanner) filtered to cert → fresh finding list
2. Triage each HIGH finding by opening the Q (use triage_q.py <cert> <qid> or batch_triage.py <cert> HIGH)
3. Classify: REAL_BUG vs FALSE_POSITIVE (per the rules below)
4. For REAL_BUG: fix via edit tool (sacred-field rewrite if needed) — OR if it's H4 enrichment-block contamination, apply B-Lite strip pattern (wipe whyWrong, delete hint + examTip; keep Q + correct + explanation)
5. Launch per-cert SME audit (research sub-agent) — read 5 random Qs + the rewritten Qs, validate against authoritative docs (use the learnLink field as source pointer). Capture any non-mashup bugs surfaced to discovered_bugs SQL table.
6. Apply SME-flagged HIGH+MEDIUM fixes
7. Build clean: python -m json.tool each modified file → npm run build → verify dist artifact has expected Q count (rubber-duck rule: build silently skips invalid JSON)
8. Commit with EXPLICIT paths (parallel-safe git rule, never git add .)
9. git pull --rebase && git push
10. SLA smoke: az-900 + checkout + practice page + per-cert questions.json + per-cert practice page
11. Update certs SQL: status='done', high_fixed=N, high_fp=M, notes=...
Triage rules (proven from sessions 1+2)¶
H1 (whyWrong has key for correct option):
- Always real. If text reinforces correct → move to explanation. If text contradicts stem → mashup evidence. If meta-commentary → delete.
- TF Qs with whyWrong[true] when correct=true: replace with whyWrong[false] explaining why False is wrong.
H2 (whyWrong text contains "is the correct/right"):
- ~95% are false positives. Apply rule: "Does whyWrong claim THIS wrong option is correct (real mashup) OR does it correctly contrast the wrong option against the right answer (FP)?"
- FP categories to record: contrast-to-right-answer, correct-in-other-context, correct-behavior-not-correct-option, partial-credit-not-complete
H4 (enrichment-block divergence, ratio ≤ 0.07):
- ~100% real inside juniper, ~60% real cross-cert
- Pattern: Q + correct + explanation on topic A; whyWrong + hint + examTip on completely different topic B
- Fix: B-Lite strip (whyWrong: {}, delete hint + examTip). Keep Q + correct + explanation. Re-author later.
H3 MEDIUM (token overlap heuristic): - ~95% FP on scenario-based Qs where stem uses high-level vocab and correct option has technical specifics - Triage by semantic incoherence, not token overlap
H2 triage shortcut (calibrated 2026-06-03 sessions c0952d49 + 1ef77611 over 165 Qs / 18 certs): - 96.4% FP rate confirmed. Sush's authoring style produces "contrast-to-right-answer" whyWrong text that triggers H2 systematically. - FP signature: whyWrong[X] (where X is a WRONG option) contains phrases like "the right tool", "the correct approach", "the proper way", "is indeed the correct" — and the sentence is describing what the correct option does, by contrast. - REAL signature: whyWrong[X] claims the wrong option itself is correct, OR options/correct/explanation tell different stories (corrupt_options mashup — same class as juniper-jncis-ent-d2-008 + mb-800 d4-034/d4-035). - Fast path: when batch-triaging a cert with all-H2 hits, read the EXPLANATION first. If explanation supports the marked correct option and whyWrong sentences contrast WRONG-vs-CORRECT, it's an FP. If explanation supports a DIFFERENT option than marked correct, or describes a different topic than Q stem, it's a corrupt_options mashup — delete.
Corrupt-options mashup (real bug class at the H1/H2 layer — DELETE): - Signature: Q stem says topic A; options are about topic B (or generic admin pages); correct marker points at something that doesn't match either; explanation describes the right answer but it's NOT in the options - Examples: juniper-jncis-ent-d2-008 (STP Q with BGP options), mb800-d4-034 (customer return Q with admin-page options), mb800-d4-035 (vendor credit Q with inventory journal explanation) - Fix: DELETE the Q. Cannot be repaired without inventing content (which would need Sush's voice approval per Voice Rule). Under-representation > misleading content per Phase D rule.
Marker-swap mashup (real bug class — REPAIR by marker swap; calibrated 2026-06-03 session 1ef77611):
- Signature: whyWrong[X] (where X NOT in correct) contains VERBATIM CAPS "This IS the correct" or "This IS the right" — author-artifact signal that the option marker was changed without updating whyWrong, leaving the now-wrong option's whyWrong text claiming it IS the right answer
- Discriminator: capital "IS" near sentence start (case-sensitive). Lowercase "this is the correct" is the standard H2 contrast pattern (FP — 6 portfolio hits all confirmed FP)
- Examples: pl300-d1-012, d1-013, d1-016, d1-021 (all repaired in 49f0cf6)
- Often paired with off-topic enrichment — all 4 pl-300 marker-swap Qs ALSO had hint/examTip/realWorld copy-pasted from completely unrelated Qs (H4 didn't catch because vocabulary overlap kept ratio above 0.07 threshold)
- Fix: REPAIR by swapping correct marker to the actual right answer (the right answer IS in the options — no content invented). Rewrite whyWrong for the now-wrong option. B-Lite strip off-topic hint/examTip/realWorld if present. NO Voice Rule blocker — author intent was already to mark the actual right answer; we're correcting a marker bookkeeping bug.
- Detection: python marker_swap_check.py (portfolio-wide, ~5s). Also wired as HARD-FAIL pre-commit hook in inject_phase_b.py.
Discovery infrastructure — capture EVERYTHING¶
You will find bugs that don't fit the mashup class. Capture them, don't forget them.
Persistence model (cross-session)¶
The session SQLite is wiped on every new session. The persistent store is a git-tracked JSON file:
- Location: C:\ssClawy\guided\files\discovered_bugs.json
- Shape: array of bug objects matching the SQL schema below
- Lifecycle:
- Session start: read JSON → load into session SQLite discovered_bugs table via SQL INSERTs
- Work in session: insert + update via SQL (SQLite is the fast queryable working copy)
- Session end: export SQL table back to JSON → commit JSON with explicit path
Helper scripts¶
In C:\ssClawy\guided\scripts\bugs\ (create if missing):
- load_bugs.py — reads discovered_bugs.json → emits SQL INSERT statements (run at session start)
- save_bugs.py — reads SQLite discovered_bugs → writes JSON (run at session end)
If those scripts don't exist yet, create them — first session that needs cross-session capture builds the tooling.
Schema (session SQLite + JSON shape)¶
CREATE TABLE discovered_bugs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
cert TEXT NOT NULL,
qid TEXT, -- nullable for cert-level issues
bug_class TEXT NOT NULL, -- e.g. 'factual_error', 'deprecated_ref', 'typographic_artifact', 'corrupt_options', 'enrichment_stripped'
severity TEXT, -- HIGH / MEDIUM / LOW
source TEXT, -- 'h1_scanner', 'h2_scanner', 'h4_scanner', 'sme_audit', 'manual_triage', 'sush_flag'
discovered_session TEXT, -- session ID
discovered_at TEXT, -- ISO date
description TEXT, -- what's wrong
proposed_fix TEXT, -- how to fix
status TEXT DEFAULT 'open', -- open / in_progress / fixed / wontfix / duplicate
fixed_session TEXT,
fix_commit TEXT,
fixed_at TEXT
);
When to insert¶
- SME audit flags a factual error (vault d4-008 class) → insert immediately
- You spot a typo / deprecated reference / pricing claim during triage → insert
- You find a Q with no good fix in current scope → insert with
status='open' - A bug class repeats 2+ times across certs → insert + propose building a scanner for it
When to query¶
- Start of every session: read JSON + run
SELECT * FROM discovered_bugs WHERE status='open' ORDER BY severity, discovered_at— surfaces the queue - After each cert triage: check if any open bugs relate to the cert you just worked on → bundle the fix
Reading the queue at session start¶
SELECT id, cert, qid, bug_class, severity, description
FROM discovered_bugs
WHERE status = 'open'
ORDER BY
CASE severity WHEN 'HIGH' THEN 1 WHEN 'MEDIUM' THEN 2 WHEN 'LOW' THEN 3 ELSE 4 END,
discovered_at;
Self-learning protocol — improve the system as you go¶
Rule: when you see a bug class 2+ times, build a scanner for it¶
The portfolio has 31,212 Qs. Manual inspection cannot find systemic bugs. Each scanner is a force multiplier.
When you discover a new bug class (e.g., SME catches "Q references deprecated az vm extension set syntax"):
1. Note the pattern + its detection signal (e.g., regex matching deprecated CLI syntax)
2. If you see it AGAIN in another cert during the same session OR a previous session: build a scanner
3. Scanner template: extend portfolio_mashup_scan_v2.py or create <bug_class>_check.py in C:\ssClawy\guided\
4. Calibrate per Rule #6 (data-first sequence): probe 1000+ honest Qs to find natural threshold cliff before deciding HIGH/MEDIUM/LOW
5. Run portfolio-wide → log all hits to discovered_bugs
6. Triage + fix the real bugs
Rule: when heuristic over-fires, recalibrate¶
If you triage 20+ findings and ≥80% are FPs, the heuristic is too loose. Either: - Tighten the threshold (use the data — check ratio distribution of confirmed-real vs confirmed-FP cases) - Reclassify (HIGH → MEDIUM, MEDIUM → advisory) so triage focuses on high-precision signals first - Add a secondary filter that eliminates the FP class (e.g., "exclude H2 hits where wrong-option-whyWrong references correct-option text")
Rule: workflow friction → propose an update¶
If a step in the per-cert workflow takes >3× the expected time or causes a real mistake, propose a workflow update at end of session. Update THIS file before committing it. Future sessions inherit the improvement.
Rule: SME audit is your secondary scanner¶
Per kickoff doc, SME audit is mandatory after rewrites. It also catches all bug classes no heuristic detects (factual errors, deprecated refs, soft leaks, cross-doc inconsistencies). Run SME audit liberally during P2 triage — capture EVERYTHING it surfaces to discovered_bugs.
Tooling (in repo, ready to use)¶
C:\ssClawy\guided\portfolio_mashup_scan_v2.py— mashup scanner (H1/H2/H3/H4). P10 update (2026-06-03 session 7bb2af7a): H2 reclassified HIGH→MEDIUM based on 320-hit / 97-cert calibration (97.5% FP).C:\ssClawy\guided\portfolio_mashup_scan.py— v1 deprecated, kept for referenceC:\ssClawy\guided\template_boilerplate_check.py— Phase B template-boilerplate scanner (B1-B12 phrases). Probe-only; supports--cert <slug>,--json <path>,--strict-failfor CI. B7-B12 added 2026-06-03 session c0952d49 for the az-104 secondary cluster. B12 usesqstem_only=Trueflag — only scans question field with end-anchored regex.C:\ssClawy\guided\strip_template_boilerplate.py— Phase B template-boilerplate stripper. Pairs with the scanner. Modes:--dry-run,--apply,--cert <slug>,--sample <N>. Two code paths:strip_field()for B1-B11 (with trailing-period restoration) andstrip_question_stem()for B12 (no period restoration; question stems already have their own terminator). Preserves JSON formatting matching b_lite_strip.py.C:\ssClawy\guided\inject_phase_b.py— Phase B authoring tool. Pre-flight gates: schema check + banned-phrase scan + mashup heuristic (cc01005) + template-boilerplate heuristic B1-B12 (49a668d auto-extended by 80bc25c) + marker-swap heuristic (ea752df).--check-onlyfor CI;--strict-mashupfor H2 hard-fail (now opt-in maximum strictness since H2 is MEDIUM by default post-P10).C:\ssClawy\guided\marker_swap_check.py— marker-swap scanner (whyWrong[X] for wrong option claims it IS correct). Case-sensitive verbatim CAPS "This IS the correct/right". 100% real-bug rate on calibration sample. Probe-only;--cert <slug>,--json <path>,--strict-failfor CI. Also imported by inject_phase_b.py as a HARD-FAIL gate.C:\ssClawy\guided\learnlink_validator.py— async portfolio dead-link scanner (NEW 2026-06-03 session 7bb2af7a). Walks alllearnLinkfields (top-level + subQuestions), dedupes, HEAD-checks (retries GET on 403/405) with 30-concurrent + 30s-timeout + 2 retries. Classifies: ok / redirect_minor / redirect_major / dead / server_error / client_error / network_error / unknown_error. Args:--sample N/--cert <slug>/--out-json/--out-md. Exit code 1 if any dead links found (CI-friendly). Periodic scanner (NOT a pre-commit gate — link liveness drifts independently of authoring). Recommended monthly run.C:\ssClawy\guided\leak_check.py— soft answer leak heuristicC:\ssClawy\guided\test-guided-qa.cjs— full Playwright QA suite (MANDATORY before pushing PracticeQuiz changes; OPTIONAL for JSON-only changes)C:\ssClawy\guided\test-banned-phrases.cjs— 17 specific template-engine artifact patternsC:\ssClawy\guided\hugo-safe.ps1(in aguidetocloud-revamp) — NOT relevant here (guided uses Astro)
Session-state tooling (reusable)¶
Session 9e5cc23f files:
- files/triage_q.py <cert> <qid> — single-Q triage view
- files/batch_triage.py <cert> HIGH|MEDIUM — compact batch view
- files/all_h1.py — H1 portfolio dump
- files/gsc-cert-top20.py — GSC traffic ranking
- files/gsc-cert-top30.json — cached GSC top-30 cert ranking
- files/load_certs.sql — re-load tracker table on next session (126 certs)
Session 89327d9c files:
- files/h4_calibrate.py — H4 calibration probe (reuse for new scanners)
- files/inspect_q.py — single-Q dump
- files/b_lite_strip.py — B-Lite stripper (reusable for any cert with H4 contamination)
- files/validate_json.py — JSON parse validator
- files/scan-final/ — pre-fix v2 scan baseline
- files/scan-post-fix/ — post-91fe2cb v2 scan
- files/phase-d-scanner-findings.md — H4 discovery report
- files/phase-d-session2-handoff.md — session 2 full handoff (read if you need deep detail)
Session 7bb2af7a files (P10/P11/P12):
- files/p11_probe.py — calibration probe template (REUSE for any new heuristic): 5 metrics × 3 populations (REAL/H2FP/HONEST) cliff analysis. Demonstrated that no token-overlap metric discriminates whyWrong-vs-Qstem drift; pattern is reusable for future scanner candidates per Rule #6 data-first sequence.
- files/p11_probe_data.json — raw drift-metric data (5 REAL + 50 H2FP + 200 HONEST samples). Future session: do NOT re-invent token-overlap drift detection — see this data for why.
- files/prefix/ — pre-fix versions (via git cat-file blob) of pl-300-d1 + ab-731-d2 used to reconstruct known-real drift cases for calibration.
- files/learnlink_probe.py — learnLink field-shape probe (Rule #6 pre-build probe for P12).
- files/learnlink-results-full.json — full portfolio scan output (7,153 URLs).
- files/learnlink-summary-full.md — human-readable findings for Sush strategic call.
Stop criteria — when to ping Sush via ask_user¶
Per the kickoff: "Don't ping me unless C6 SME surfaces something needing my call."
Translate to: - SME audit surfaces a content scope question (e.g., "should this Q be rewritten or deleted entirely?") - Architectural choice with significant cost (e.g., a new scanner type would take >8h to build, or a fix pattern would touch >50 certs) - Practice exam SLA smoke fails post-deploy and revert isn't trivial - A bug class repeats so often it changes the program's scope (e.g., "factual errors are 10× more common than mashup bugs — should we pivot Phase D into Phase F factual-correctness?") - Per-cert effort blows up 3× the estimate - Sush-voice content question (any customer-facing rewrite needs his voice approval per Voice Rule)
Otherwise: full autonomy. Log everything to discovered_bugs + session journal + this doc.
Session end checklist¶
- Update
certsSQL: mark each cert worked status='done' / 'high_done' / 'blocked' - Update
discovered_bugsSQL: any new bugs found go inopen; bugs you fixed go tofixedwithfix_commit - Update THIS doc:
- Bump the "Last updated" line at top
- Move completed P-items from "What's open" into "What's shipped"
- Add any new P-items discovered
- Tighten the hours estimate if reality differed
- Update
~/.copilot/session-journal.mdwith per-session entry - SLA smoke green (3 curls)
- Commit + push + git pull --rebase
Resume one-liner (give this to Sush for next session)¶
Reference links (deep detail — read if needed)¶
- Original kickoff:
C:\ssClawy\guided\files\portfolio-mashup-cleanup-kickoff.md - Session 1 handoff (top-20 triage + 5 mashup fixes):
~/.copilot/session-state/9e5cc23f-37cd-4b5e-b796-29e715835fb4/files/phase-d-session1-handoff.md - Session 2 handoff (H4 scanner + juniper B-Lite):
~/.copilot/session-state/89327d9c-7309-438d-a8f6-3f257a278c82/files/phase-d-session2-handoff.md - Session 2 scanner findings (H4 calibration + cross-cert discovery):
~/.copilot/session-state/89327d9c-7309-438d-a8f6-3f257a278c82/files/phase-d-scanner-findings.md - Session 3 (7a664e44) template-boilerplate decision doc + scan:
~/.copilot/session-state/7a664e44-92eb-44f6-904e-89078e4d33ab/files/template-boilerplate-decision.md+template-boilerplate-scan.json - mb-800 Phase D deferred (12 sacred-needed Qs from before Phase D portfolio scope):
C:\ssClawy\guided\files\mb-800-phase-d-followup.md