🏥 Service Health Dashboard¶

Live at: aguidetocloud.com/service-health/ Repo: susanthgit/service-health (pipeline) Built: April 2026 Cost: $0/month (GitHub Actions Free + Azure SWA Free)

What It Does¶

A unified dashboard that monitors Microsoft 365 service health and Azure incident history — pulling data from the Graph API and Azure Status page every 2 hours, then displaying it with status cards, active incident tracking, trend analytics, and per-incident detail modals.

Think of it like a personal command centre for Microsoft service outages — see what's down, what's recovering, and what happened last month.

Architecture Overview¶

┌───────────────────────────────────────────────────────────┐
│              GitHub Actions (every 2 hours)                │
├───────────────────────────────────────────────────────────┤
│                                                           │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────┐ │
│  │fetch_health  │  │ fetch_azure  │  │ generate_data   │ │
│  │  .py         │  │  .py         │  │   .py           │ │
│  │ Graph API    │  │ Azure Status │  │ merge + stats   │ │
│  └──────────────┘  └──────────────┘  └─────────────────┘ │
│        │                  │                   │           │
│  health.json        azure.json          latest.json      │
│  previous_state                         stats.json       │
│                                         incidents/*.json │
│                                         archive/*.json   │
│                                         feed.xml         │
├───────────────────────────────────────────────────────────┤
│    Copy data to main site → hugo build → swa deploy       │
└───────────────────────────────────────────────────────────┘
         │
         ▼
┌───────────────────────────────────────────────────────────┐
│          aguidetocloud.com/service-health/                 │
│                                                           │
│  ┌─────────────────┐  ┌──────────────────┐               │
│  │service-health.js│  │service-health.css│               │
│  └─────────────────┘  └──────────────────┘               │
│                                                           │
│  Summary → Status Grid → Active Incidents → Trends →      │
│  Date Lookup → Timeline → CSV Export → Incident Modal     │
└───────────────────────────────────────────────────────────┘

Pipeline Scripts¶

1. `fetch_health.py`¶

Feature	Detail
Data source	Microsoft Graph API: `/admin/serviceAnnouncement/healthOverviews` + `/issues`
Auth	Azure AD token (client credentials or Azure CLI)
Change detection	Compares against `site/previous_state.json`
Output	`site/raw/health.json`, `site/previous_state.json`

2. `fetch_azure.py`¶

Feature	Detail
Data source	Scrapes `azure.status.microsoft/en-us/status/history/`
Extracts	Tracking IDs, regions, service names, dates, PIR URLs, video URLs
Error handling	Non-blocking on failure (Azure data is supplementary)
Output	`site/raw/azure.json`

3. `generate_data.py`¶

Feature	Detail
Merge	Combines M365 health + Azure incidents into unified dataset
Output files	`site/latest.json`, `site/stats.json`, `site/feed.xml`
Incident detail	Per-incident JSON at `site/incidents/{id}.json`
Monthly archive	`site/archive/YYYY-MM.json`
Stats	Per-service metrics + monthly aggregates
latest.json shape	`generated_at`, totals, `services[]`, `issues[]`

GitHub Actions Workflow¶

File: .github/workflows/service-health.yml (~91 lines)

Schedule: Every 2 hours + manual dispatch
Auth: OIDC Azure login

Steps:
1. Checkout
2. Setup Python
3. Fetch M365 health (Graph API)
4. Fetch Azure history (web scrape)
5. Generate data files
6. Copy to aguidetocloud-revamp/static/data/service-health/
7. Hugo build + swa deploy

Frontend Implementation¶

Hugo Template — `list.html` (~127 lines)¶

Renders the full dashboard UI:

Section	What it shows
Summary banner	Overall health status (healthy/degraded/etc.)
Status grid	Per-service status cards with health indicators
Active incidents	Currently open issues with severity + details
Trends & insights	Most affected services, monthly scorecard
Date lookup	Search incidents by date range
Timeline	Chronological incident history with load-more
Export	CSV download + RSS feed + admin links
Incident modal	Click any incident for full detail overlay
JSON-LD	WebApplication schema

JavaScript — `service-health.js`¶

Feature	Detail
Data	Fetches `/data/service-health/latest.json`
sessionStorage	Caches for 10 minutes
Summary render	Overall status banner + freshness indicator
Service cards	Per-service health status with issue counts
Active incidents	Highlighted cards for current issues
Stats	Async-loads `stats.json` for trend chart + "most affected" + scorecard
Filters	URL state: `?q=`, `?service=`, `?status=`
Incident detail	Loads per-incident JSON from `/data/service-health/incidents/{id}.json`
CSV export	Client-side CSV generation
Date lookup	Filter incidents by date range

CSS — `service-health.css`¶

Element	Detail
Accent	Sky `#38BDF8`
CSS namespace	`.shealth-*` classes throughout
Status colours	Green (healthy), yellow (advisory), red (incident)
Cards	Service cards with status badges
Modal	Full-screen incident detail overlay
Analytics widgets	Trend chart, scorecard, most-affected
Timeline	Chronological incident history
Mobile	Responsive cards, stacked layout

Data Files Served¶

Path	Purpose	Cache
`/data/service-health/latest.json`	Current health status	30 min (SWA config)
`/data/service-health/stats.json`	Historical analytics	30 min
`/data/service-health/feed.xml`	RSS feed	30 min
`/data/service-health/incidents/{id}.json`	Per-incident detail	30 min
`/data/service-health/archive/YYYY-MM.json`	Monthly archives	30 min

Key Design Decisions¶

Decision	Rationale
Every 2 hours	Service health changes frequently — 2-hour cadence is responsive without being wasteful
Graph API + Azure scrape	Graph gives M365 status; Azure Status page adds PIR/incident history
Per-incident JSON	Lazy-loads detail only when user clicks — keeps initial payload small
sessionStorage 10 min	Prevents re-fetching during tab navigation while staying reasonably fresh
Non-blocking Azure fetch	Azure scrape is supplementary — if it fails, M365 data still renders

Maintenance¶

Task	How
Force a refresh	Trigger `service-health.yml` manually via GitHub Actions
Check pipeline	Review GitHub Actions logs for Graph API auth or scrape failures
Update Graph permissions	Service principal needs `ServiceHealth.Read.All` in Graph API

Last updated: 11 April 2026