🏥 Microsoft Service Health Tracker¶
Status: ✅ Complete Priority: 📌 Important (Tier 2) Category: 🎬 Content / 💼 Official Created: 2026-04-10 Completed: 2026-04-11 Part of: Free Tools Section Live at: aguidetocloud.com/service-health/
The Problem¶
"Was Teams down last Tuesday?" — everyone asks this, nobody has a good answer. Microsoft's Service Health dashboard is locked behind the M365 Admin Center with no public searchable archive.
The Solution¶
A searchable incident timeline + reliability dashboard at aguidetocloud.com/service-health/ with:
- 🏥 32 M365 service status cards (real-time from Graph API)
- 📊 103+ incidents tracked with full communication timelines
- ☁️ Azure incidents merged from Azure Status history (PIR scraping)
- 🌍 Region badges extracted from incident text (30% of issues)
- 🏷️ Feature tags (Mailflow, Authentication, etc.)
- 📈 Monthly trend chart — incidents per month
- 🏆 Most affected leaderboard — top 5 services
- 📋 Quick stats scorecard — total, avg resolution, trend
- 📅 Date lookup — "Was Teams down on March 5?"
- 📥 CSV export for reports
- 🔗 Deep linking —
?service=Teams&status=active - 📡 RSS feed for subscribers
Architecture¶
┌──────────────────────────────────────────────────────────┐
│ GitHub Actions (Every 2 hours — 0 */2 * * *) │
│ │
│ 1. fetch_health.py → Graph API: M365 service health │
│ 2. fetch_azure.py → Azure Status history scraping │
│ 3. generate_data.py → Merge, slim JSON, RSS, stats │
│ 4. Push to main site repo → Hugo build → Deploy │
└──────────────────────────────────────────────────────────┘
Tech Stack¶
| Component | Tech | Cost |
|---|---|---|
| M365 data source | Microsoft Graph Service Communications API | $0 |
| Azure data source | Azure Status history page scraping (BeautifulSoup) | $0 |
| Auth | Dedicated Azure AD app (service-health), ServiceHealth.Read.All |
$0 |
| Region extraction | Regex on impactDescription + posts (15 region patterns) | $0 |
| Scheduling | GitHub Actions every 2h | $0 |
| Frontend | Hugo + vanilla JS, coral/orange (#F97316) theme | $0 |
| Hosting | Azure SWA (existing) | $0 |
| Total | $0 |
Key Decisions & Learnings¶
-
Dedicated app registration — Rubber-duck critique caught that adding
ServiceHealth.Read.Allto the shared ainews app would broaden blast radius. Created separateservice-healthapp. -
Graph API history depth — Returns ~100 issues back to Jan 2025 for the lab tenant. Not a guaranteed retention period, so we archive from day 1.
-
Region data is text-only — Graph API has NO structured
impactedRegionsfield. Built regex extraction fromimpactDescription+ update posts. Covers 30% of issues with geographic data. -
Azure Status has no API — Only HTML (history page) and RSS (active only, often empty). Built BeautifulSoup scraper for PIR backfill. Tracking IDs extracted from
aka.ms/air/{id}URLs. -
Incremental feature delivery — Launched MVP (cards + timeline), then layered 8 analytics features in subsequent deploys. Each deploy was independently useful.
-
allow-no-subscriptions: true— OIDC login in GitHub Actions needs this when the SP only needs Graph API access, not Azure subscription access.
Data Coverage¶
| Source | Items | Services | History |
|---|---|---|---|
| Microsoft Graph | ~100 | 32 M365 services | Jan 2025+ |
| Azure Status scrape | 3 PIRs | Azure infra | Feb 2026+ |
| Total | 103 | 33 | 15+ months |
Repos¶
- Pipeline:
susanthgit/service-health(private) - Frontend: Part of
susanthgit/aguidetocloud-revamp - App registration:
service-health(client ID:a038898a-5e14-4d55-9d26-341d6013a436)