Skip to content

🏥 Microsoft Service Health Tracker

Status: ✅ Complete Priority: 📌 Important (Tier 2) Category: 🎬 Content / 💼 Official Created: 2026-04-10 Completed: 2026-04-11 Part of: Free Tools Section Live at: aguidetocloud.com/service-health/


The Problem

"Was Teams down last Tuesday?" — everyone asks this, nobody has a good answer. Microsoft's Service Health dashboard is locked behind the M365 Admin Center with no public searchable archive.

The Solution

A searchable incident timeline + reliability dashboard at aguidetocloud.com/service-health/ with:

  • 🏥 32 M365 service status cards (real-time from Graph API)
  • 📊 103+ incidents tracked with full communication timelines
  • ☁️ Azure incidents merged from Azure Status history (PIR scraping)
  • 🌍 Region badges extracted from incident text (30% of issues)
  • 🏷️ Feature tags (Mailflow, Authentication, etc.)
  • 📈 Monthly trend chart — incidents per month
  • 🏆 Most affected leaderboard — top 5 services
  • 📋 Quick stats scorecard — total, avg resolution, trend
  • 📅 Date lookup — "Was Teams down on March 5?"
  • 📥 CSV export for reports
  • 🔗 Deep linking?service=Teams&status=active
  • 📡 RSS feed for subscribers

Architecture

┌──────────────────────────────────────────────────────────┐
│  GitHub Actions (Every 2 hours — 0 */2 * * *)            │
│                                                          │
│  1. fetch_health.py  → Graph API: M365 service health    │
│  2. fetch_azure.py   → Azure Status history scraping     │
│  3. generate_data.py → Merge, slim JSON, RSS, stats      │
│  4. Push to main site repo → Hugo build → Deploy         │
└──────────────────────────────────────────────────────────┘

Tech Stack

Component Tech Cost
M365 data source Microsoft Graph Service Communications API $0
Azure data source Azure Status history page scraping (BeautifulSoup) $0
Auth Dedicated Azure AD app (service-health), ServiceHealth.Read.All $0
Region extraction Regex on impactDescription + posts (15 region patterns) $0
Scheduling GitHub Actions every 2h $0
Frontend Hugo + vanilla JS, coral/orange (#F97316) theme $0
Hosting Azure SWA (existing) $0
Total $0

Key Decisions & Learnings

  1. Dedicated app registration — Rubber-duck critique caught that adding ServiceHealth.Read.All to the shared ainews app would broaden blast radius. Created separate service-health app.

  2. Graph API history depth — Returns ~100 issues back to Jan 2025 for the lab tenant. Not a guaranteed retention period, so we archive from day 1.

  3. Region data is text-only — Graph API has NO structured impactedRegions field. Built regex extraction from impactDescription + update posts. Covers 30% of issues with geographic data.

  4. Azure Status has no API — Only HTML (history page) and RSS (active only, often empty). Built BeautifulSoup scraper for PIR backfill. Tracking IDs extracted from aka.ms/air/{id} URLs.

  5. Incremental feature delivery — Launched MVP (cards + timeline), then layered 8 analytics features in subsequent deploys. Each deploy was independently useful.

  6. allow-no-subscriptions: true — OIDC login in GitHub Actions needs this when the SP only needs Graph API access, not Azure subscription access.

Data Coverage

Source Items Services History
Microsoft Graph ~100 32 M365 services Jan 2025+
Azure Status scrape 3 PIRs Azure infra Feb 2026+
Total 103 33 15+ months

Repos

  • Pipeline: susanthgit/service-health (private)
  • Frontend: Part of susanthgit/aguidetocloud-revamp
  • App registration: service-health (client ID: a038898a-5e14-4d55-9d26-341d6013a436)