Skip to content

πŸ₯ Service Health Dashboard

Live at: aguidetocloud.com/service-health/ Repo: susanthgit/service-health (pipeline) Built: April 2026 Cost: $0/month (GitHub Actions Free + Azure SWA Free)


What It Does

A unified dashboard that monitors Microsoft 365 service health and Azure incident history β€” pulling data from the Graph API and Azure Status page every 2 hours, then displaying it with status cards, active incident tracking, trend analytics, and per-incident detail modals.

Think of it like a personal command centre for Microsoft service outages β€” see what's down, what's recovering, and what happened last month.


Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              GitHub Actions (every 2 hours)                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚fetch_health  β”‚  β”‚ fetch_azure  β”‚  β”‚ generate_data   β”‚ β”‚
β”‚  β”‚  .py         β”‚  β”‚  .py         β”‚  β”‚   .py           β”‚ β”‚
β”‚  β”‚ Graph API    β”‚  β”‚ Azure Status β”‚  β”‚ merge + stats   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚        β”‚                  β”‚                   β”‚           β”‚
β”‚  health.json        azure.json          latest.json      β”‚
β”‚  previous_state                         stats.json       β”‚
β”‚                                         incidents/*.json β”‚
β”‚                                         archive/*.json   β”‚
β”‚                                         feed.xml         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚    Copy data to main site β†’ hugo build β†’ swa deploy       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          aguidetocloud.com/service-health/                 β”‚
β”‚                                                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚service-health.jsβ”‚  β”‚service-health.cssβ”‚               β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                           β”‚
β”‚  Summary β†’ Status Grid β†’ Active Incidents β†’ Trends β†’      β”‚
β”‚  Date Lookup β†’ Timeline β†’ CSV Export β†’ Incident Modal     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pipeline Scripts

1. fetch_health.py

Feature Detail
Data source Microsoft Graph API: /admin/serviceAnnouncement/healthOverviews + /issues
Auth Azure AD token (client credentials or Azure CLI)
Change detection Compares against site/previous_state.json
Output site/raw/health.json, site/previous_state.json

2. fetch_azure.py

Feature Detail
Data source Scrapes azure.status.microsoft/en-us/status/history/
Extracts Tracking IDs, regions, service names, dates, PIR URLs, video URLs
Error handling Non-blocking on failure (Azure data is supplementary)
Output site/raw/azure.json

3. generate_data.py

Feature Detail
Merge Combines M365 health + Azure incidents into unified dataset
Output files site/latest.json, site/stats.json, site/feed.xml
Incident detail Per-incident JSON at site/incidents/{id}.json
Monthly archive site/archive/YYYY-MM.json
Stats Per-service metrics + monthly aggregates
latest.json shape generated_at, totals, services[], issues[]

GitHub Actions Workflow

File: .github/workflows/service-health.yml (~91 lines)

Schedule: Every 2 hours + manual dispatch
Auth: OIDC Azure login

Steps:
1. Checkout
2. Setup Python
3. Fetch M365 health (Graph API)
4. Fetch Azure history (web scrape)
5. Generate data files
6. Copy to aguidetocloud-revamp/static/data/service-health/
7. Hugo build + swa deploy

Frontend Implementation

Hugo Template β€” list.html (~127 lines)

Renders the full dashboard UI:

Section What it shows
Summary banner Overall health status (healthy/degraded/etc.)
Status grid Per-service status cards with health indicators
Active incidents Currently open issues with severity + details
Trends & insights Most affected services, monthly scorecard
Date lookup Search incidents by date range
Timeline Chronological incident history with load-more
Export CSV download + RSS feed + admin links
Incident modal Click any incident for full detail overlay
JSON-LD WebApplication schema

JavaScript β€” service-health.js

Feature Detail
Data Fetches /data/service-health/latest.json
sessionStorage Caches for 10 minutes
Summary render Overall status banner + freshness indicator
Service cards Per-service health status with issue counts
Active incidents Highlighted cards for current issues
Stats Async-loads stats.json for trend chart + "most affected" + scorecard
Filters URL state: ?q=, ?service=, ?status=
Incident detail Loads per-incident JSON from /data/service-health/incidents/{id}.json
CSV export Client-side CSV generation
Date lookup Filter incidents by date range

CSS β€” service-health.css

Element Detail
Accent Sky #38BDF8
CSS namespace .shealth-* classes throughout
Status colours Green (healthy), yellow (advisory), red (incident)
Cards Service cards with status badges
Modal Full-screen incident detail overlay
Analytics widgets Trend chart, scorecard, most-affected
Timeline Chronological incident history
Mobile Responsive cards, stacked layout

Data Files Served

Path Purpose Cache
/data/service-health/latest.json Current health status 30 min (SWA config)
/data/service-health/stats.json Historical analytics 30 min
/data/service-health/feed.xml RSS feed 30 min
/data/service-health/incidents/{id}.json Per-incident detail 30 min
/data/service-health/archive/YYYY-MM.json Monthly archives 30 min

Key Design Decisions

Decision Rationale
Every 2 hours Service health changes frequently β€” 2-hour cadence is responsive without being wasteful
Graph API + Azure scrape Graph gives M365 status; Azure Status page adds PIR/incident history
Per-incident JSON Lazy-loads detail only when user clicks β€” keeps initial payload small
sessionStorage 10 min Prevents re-fetching during tab navigation while staying reasonably fresh
Non-blocking Azure fetch Azure scrape is supplementary β€” if it fails, M365 data still renders

Maintenance

Task How
Force a refresh Trigger service-health.yml manually via GitHub Actions
Check pipeline Review GitHub Actions logs for Graph API auth or scrape failures
Update Graph permissions Service principal needs ServiceHealth.Read.All in Graph API

Last updated: 11 April 2026