Skip to content

🏥 Service Health Dashboard

Live at: aguidetocloud.com/service-health/ Repo: susanthgit/service-health (pipeline) Built: April 2026 Cost: $0/month (GitHub Actions Free + Azure SWA Free)


What It Does

A unified dashboard that monitors Microsoft 365 service health and Azure incident history — pulling data from the Graph API and Azure Status page every 2 hours, then displaying it with status cards, active incident tracking, trend analytics, and per-incident detail modals.

Think of it like a personal command centre for Microsoft service outages — see what's down, what's recovering, and what happened last month.


Architecture Overview

┌───────────────────────────────────────────────────────────┐
│              GitHub Actions (every 2 hours)                │
├───────────────────────────────────────────────────────────┤
│                                                           │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────┐ │
│  │fetch_health  │  │ fetch_azure  │  │ generate_data   │ │
│  │  .py         │  │  .py         │  │   .py           │ │
│  │ Graph API    │  │ Azure Status │  │ merge + stats   │ │
│  └──────────────┘  └──────────────┘  └─────────────────┘ │
│        │                  │                   │           │
│  health.json        azure.json          latest.json      │
│  previous_state                         stats.json       │
│                                         incidents/*.json │
│                                         archive/*.json   │
│                                         feed.xml         │
├───────────────────────────────────────────────────────────┤
│    Copy data to main site → hugo build → swa deploy       │
└───────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
│          aguidetocloud.com/service-health/                 │
│                                                           │
│  ┌─────────────────┐  ┌──────────────────┐               │
│  │service-health.js│  │service-health.css│               │
│  └─────────────────┘  └──────────────────┘               │
│                                                           │
│  Summary → Status Grid → Active Incidents → Trends →      │
│  Date Lookup → Timeline → CSV Export → Incident Modal     │
└───────────────────────────────────────────────────────────┘

Pipeline Scripts

1. fetch_health.py

Feature Detail
Data source Microsoft Graph API: /admin/serviceAnnouncement/healthOverviews + /issues
Auth Azure AD token (client credentials or Azure CLI)
Change detection Compares against site/previous_state.json
Output site/raw/health.json, site/previous_state.json

2. fetch_azure.py

Feature Detail
Data source Scrapes azure.status.microsoft/en-us/status/history/
Extracts Tracking IDs, regions, service names, dates, PIR URLs, video URLs
Error handling Non-blocking on failure (Azure data is supplementary)
Output site/raw/azure.json

3. generate_data.py

Feature Detail
Merge Combines M365 health + Azure incidents into unified dataset
Output files site/latest.json, site/stats.json, site/feed.xml
Incident detail Per-incident JSON at site/incidents/{id}.json
Monthly archive site/archive/YYYY-MM.json
Stats Per-service metrics + monthly aggregates
latest.json shape generated_at, totals, services[], issues[]

GitHub Actions Workflow

File: .github/workflows/service-health.yml (~91 lines)

Schedule: Every 2 hours + manual dispatch
Auth: OIDC Azure login

Steps:
1. Checkout
2. Setup Python
3. Fetch M365 health (Graph API)
4. Fetch Azure history (web scrape)
5. Generate data files
6. Copy to aguidetocloud-revamp/static/data/service-health/
7. Hugo build + swa deploy

Frontend Implementation

Hugo Template — list.html (~127 lines)

Renders the full dashboard UI:

Section What it shows
Summary banner Overall health status (healthy/degraded/etc.)
Status grid Per-service status cards with health indicators
Active incidents Currently open issues with severity + details
Trends & insights Most affected services, monthly scorecard
Date lookup Search incidents by date range
Timeline Chronological incident history with load-more
Export CSV download + RSS feed + admin links
Incident modal Click any incident for full detail overlay
JSON-LD WebApplication schema

JavaScript — service-health.js

Feature Detail
Data Fetches /data/service-health/latest.json
sessionStorage Caches for 10 minutes
Summary render Overall status banner + freshness indicator
Service cards Per-service health status with issue counts
Active incidents Highlighted cards for current issues
Stats Async-loads stats.json for trend chart + "most affected" + scorecard
Filters URL state: ?q=, ?service=, ?status=
Incident detail Loads per-incident JSON from /data/service-health/incidents/{id}.json
CSV export Client-side CSV generation
Date lookup Filter incidents by date range

CSS — service-health.css

Element Detail
Accent Sky #38BDF8
CSS namespace .shealth-* classes throughout
Status colours Green (healthy), yellow (advisory), red (incident)
Cards Service cards with status badges
Modal Full-screen incident detail overlay
Analytics widgets Trend chart, scorecard, most-affected
Timeline Chronological incident history
Mobile Responsive cards, stacked layout

Data Files Served

Path Purpose Cache
/data/service-health/latest.json Current health status 30 min (SWA config)
/data/service-health/stats.json Historical analytics 30 min
/data/service-health/feed.xml RSS feed 30 min
/data/service-health/incidents/{id}.json Per-incident detail 30 min
/data/service-health/archive/YYYY-MM.json Monthly archives 30 min

Key Design Decisions

Decision Rationale
Every 2 hours Service health changes frequently — 2-hour cadence is responsive without being wasteful
Graph API + Azure scrape Graph gives M365 status; Azure Status page adds PIR/incident history
Per-incident JSON Lazy-loads detail only when user clicks — keeps initial payload small
sessionStorage 10 min Prevents re-fetching during tab navigation while staying reasonably fresh
Non-blocking Azure fetch Azure scrape is supplementary — if it fails, M365 data still renders

Maintenance

Task How
Force a refresh Trigger service-health.yml manually via GitHub Actions
Check pipeline Review GitHub Actions logs for Graph API auth or scrape failures
Update Graph permissions Service principal needs ServiceHealth.Read.All in Graph API

Last updated: 11 April 2026