Skip to content

📰 AI News Pipeline

Live at: aguidetocloud.com/ai-news/ Repo: susanthgit/-ainews (V3) Built: March 28, 2026 (V1) → April 10, 2026 (V3) Cost: $0/month (GitHub Actions Free + Azure SWA Free + Azure OpenAI ~$0.02/day)


What It Does

An automated AI news aggregation system that collects articles from 34 RSS feeds across 20 categories, summarises them with GPT-4o mini, and displays them in a 3-tier TLDR layout — updated 4 times daily.

Think of it like building your own personalised AI news service that runs itself.


Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                   GitHub Actions (4x daily)              │
│   Cron: 5:00, 11:00, 17:00, 23:00 UTC                  │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────────────┐   ┌──────────────┐   ┌─────────────┐ │
│  │ fetch_news.py │──▶│ summarise.py │──▶│generate_page│ │
│  │  RSS + dedup  │   │ GPT-4o mini  │   │  .py        │ │
│  └──────────────┘   └──────────────┘   └─────────────┘ │
│         │                  │                   │        │
│    articles.json     summaries.json    latest/weekly/   │
│    broken_feeds.json                   monthly.json     │
│                                        feed.xml         │
├─────────────────────────────────────────────────────────┤
│  Clone main site repo → copy data → hugo build → deploy │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│              aguidetocloud.com/ai-news/                  │
│                                                         │
│  ┌─────────┐  ┌─────────────┐  ┌────────────────────┐  │
│  │ainews.js│  │ainews.css    │  │ai-news/list.html   │  │
│  │ 615 loc │  │ 837 loc     │  │Hugo template        │  │
│  └─────────┘  └─────────────┘  └────────────────────┘  │
│                                                         │
│  Tabs: Today | This Month                               │
│  Tiers: 🔥 Headlines | 🧠 Deep Dives | ⚡ Quick Links   │
└─────────────────────────────────────────────────────────┘

Pipeline Scripts

1. fetch_news.py (~490 lines)

Feature Detail
Input 34 RSS feeds defined in scripts/feeds.json
Lookback 72 hours (HOURS_LOOKBACK = 72)
Dedup — exact md5(url + title) hash
Dedup — fuzzy Jaccard word overlap > 0.75 threshold
Image extraction RSS enclosures → HTML content → og:image backfill (threaded)
Keyword re-categorization Maps articles to correct categories when RSS source doesn't match
Output site/articles.json, site/broken_feeds.json

Zero-feed categories

DeepSeek, Perplexity, and Rumours have no dedicated RSS feeds — they rely entirely on keyword re-categorization from general feeds (e.g., Top Stories, Industry).

2. summarise.py (~211 lines)

Feature Detail
Model Azure OpenAI gpt-4o-mini (deployment name)
Endpoint ainews-openai.openai.azure.com
Auth OIDC → Azure AD token (no API keys!)
Batch size 10 articles per API call
Temperature 0.2
Output per article summary, why_it_matters, tier, cluster
Tiers headline, deep_dive, quick
Fallback Single-item mode if batch parse fails
Cache-only mode If no token available, uses cached summaries only

3. generate_page.py (~523 lines)

Feature Detail
Daily archive site/archive/YYYY-MM-DD/index.html + JSON
Weekly digest site/weekly.json with trending topics
Monthly digest site/monthly.json
RSS feed site/feed.xml
Breaking news Cluster label + 3+ sources → is_breaking = true, tier forced to headline
Breaking detection Fuzzy title word overlap > 0.5 for grouping

GitHub Actions Workflow

File: .github/workflows/nightly-news.yml (~145 lines)

Schedule: 4x daily (5:00, 11:00, 17:00, 23:00 UTC)
Triggers: cron + manual dispatch

Steps:
1. Checkout repo
2. Setup Python 3.12
3. Install requirements
4. Run fetch_news.py
5. Feed health check → create GitHub Issue if feeds broken
6. Azure OIDC login (azure/login@v2)
7. Get Azure OpenAI token
8. Run summarise.py
9. Run generate_page.py
10. Commit site/ back to pipeline repo
11. Clone main site repo → copy data → hugo build → swa deploy
12. On failure → create GitHub Issue

OIDC Authentication

No stored secrets or API keys — the pipeline authenticates using OpenID Connect:

  1. GitHub Actions requests an OIDC token (id-token: write)
  2. Azure login via federated credential (azure/login@v2)
  3. Azure OpenAI token minted via az account get-access-token --resource https://cognitiveservices.azure.com
  4. Token passed to summarise.py as AZURE_OPENAI_TOKEN

Service principal: ainews app (client ID 51b70f10-db98-4c14-84ef-2ca661e51b6d) — shared with M365 Roadmap pipeline.

Feed Health Monitoring

  • After fetch, checks site/broken_feeds.json
  • If any feeds failed: creates a GitHub Issue with a markdown table of broken feeds
  • If all healthy: logs "All feeds healthy"
  • Issues are labeled bug for tracking

Frontend Implementation

Hugo Templates

Template Purpose
layouts/ai-news/list.html (54 lines) Main page — injects category config, renders tabs + RSS/CSV links
layouts/ai-news-category/list.html (64 lines) Category SEO pages — adds __ainewsCategoryFilter, CollectionPage JSON-LD

JavaScript — ainews.js (615 lines)

Feature How it works
Tab system Today + This Month tabs; loads latest.json or monthly.json per tab
3-tier layout Separates articles by tier — headlines get hero cards, deep dives get full cards
Search Debounced 200ms, matches against textContent, updates URL with ?q=
Category filtering 20 categories from window.__ainewsCategoryConfig (injected by Hugo from TOML)
Always-show categories M365 Copilot + Copilot Studio always visible in chip bar
sessionStorage cache Key: ainews_ + URL — caches fetched JSON to avoid re-downloading
Source favicons Google favicon service: google.com/s2/favicons?domain=...&sz=16
URL state Reads/writes ?cat= and ?q= for shareable filtered views
Category pages Respects window.__ainewsCategoryFilter to pre-filter content

CSS — ainews.css (837 lines)

Element Detail
Accent colour Magenta #ff66ff
Hero gradient color-mix(in srgb, #ff66ff 50%, #141425)
3-tier styling Headlines = cyan callouts, Deep dives = full cards, Quick links = compact rows
"Why it matters" Enlarged callout — 0.9rem text, 4px border, uppercase label, prominent background
Mobile @media (max-width: 768px) — stacked toolbar, hidden deep dive summaries, compact tabs
CSS namespace .ainews-* classes throughout

Category Configuration

Defined in data/ainews_categories.toml (124 lines) — injected into JS via Hugo template as window.__ainewsCategoryConfig.

20 categories: Top Stories, Microsoft, M365 Copilot, Copilot Studio, GitHub Copilot, AI Foundry, OpenAI, Anthropic, Google, Meta, DeepSeek, NVIDIA, Apple, Amazon, Mistral, xAI, Perplexity, Open Source, Industry, Rumours

Each category has: name, emoji, color, display order.


SEO Category Landing Pages

13 pages under content/ai-news/<slug>/_index.md:

Page Filter
/ai-news/microsoft/ Microsoft
/ai-news/m365-copilot/ M365 Copilot
/ai-news/copilot-studio/ Copilot Studio
/ai-news/github-copilot/ GitHub Copilot
/ai-news/ai-foundry/ AI Foundry
/ai-news/openai/ OpenAI
/ai-news/anthropic/ Anthropic
/ai-news/google/ Google
/ai-news/meta/ Meta
/ai-news/deepseek/ DeepSeek
/ai-news/apple/ Apple
/ai-news/amazon/ Amazon
/ai-news/nvidia/ NVIDIA

Each uses type: "ai-news-category" in front matter — Hugo ignores layout for _index.md section pages.


RSS Feed Sources (34 feeds across 17 categories)

Category Feed Count Max Articles
Top Stories 6 10
Microsoft 3 8
M365 Copilot 2 8
Copilot Studio 2 6
GitHub Copilot 2 6
AI Foundry 3 6
OpenAI 1 8
Google 2 8
Meta 2 6
Anthropic 2 6
Mistral 1 4
xAI 1 4
Apple 1 6
NVIDIA 1 6
Amazon 2 6
Open Source 1 6
Industry 2 8

Community proxy feeds

Anthropic, Mistral, and xAI use GitHub-hosted community proxy RSS feeds (no official RSS). Periodically verify these still work via feed health monitoring.


Data Flow Summary

RSS Feeds (34)
fetch_news.py → articles.json (raw articles, deduped)
summarise.py → summaries.json (GPT summaries + tiers)
generate_page.py → latest.json, weekly.json, monthly.json, feed.xml
GitHub Actions → copies to aguidetocloud-revamp/static/data/ainews/
Hugo build → Azure SWA deploy
ainews.js → loads JSON → renders 3-tier TLDR layout

Key Design Decisions

Decision Rationale
Subfolder not subdomain /ai-news/ gets SEO juice from main domain vs standalone ainews.aguidetocloud.com
OIDC not API keys No rotating secrets — tokens are minted fresh each run
3-tier TLDR Inspired by TLDR AI newsletter (920K readers) — scan headlines, read deep dives, skim quick links
"Why it matters" The USP — readers love the human-readable context on each article
sessionStorage cache Prevents re-fetching same JSON when navigating between tabs/pages
Category TOML config Single source of truth for category names, emojis, colours — shared between Hugo + JS
4x daily updates Every 6 hours ensures fresh content throughout the day

Maintenance

Task How
Add a new RSS feed Edit scripts/feeds.json — add URL + category + optional max_articles
Add a new category Add to feeds.json + data/ainews_categories.toml + create SEO page under content/ai-news/
Check feed health Look at GitHub Issues in -ainews repo (auto-created by pipeline)
Force a refresh Trigger nightly-news.yml manually via GitHub Actions
Update OIDC Service principal: ainews app → Federated credentials in Entra ID
Debug pipeline Check GitHub Actions logs → fetch_news.py output → summaries.jsonlatest.json

Version History

Version Date What Changed
V1 Mar 28 Standalone ainews.aguidetocloud.com, basic HTML, nightly
V2 Apr 1–2 Moved to /ai-news/, 3-tier TLDR, OIDC auth, twice daily
V3 Apr 10 4x daily, 20 categories, 34 feeds, fuzzy dedup, breaking detection, feed health monitoring, 13 SEO pages, keyword re-categorization

Last updated: 11 April 2026