Skip to content

πŸ“° AI News Pipeline

Live at: aguidetocloud.com/ai-news/ Repo: susanthgit/-ainews (V3) Built: March 28, 2026 (V1) β†’ April 10, 2026 (V3) Cost: $0/month (GitHub Actions Free + Azure SWA Free + Azure OpenAI ~$0.02/day)


What It Does

An automated AI news aggregation system that collects articles from 34 RSS feeds across 20 categories, summarises them with GPT-4o mini, and displays them in a 3-tier TLDR layout β€” updated 4 times daily.

Think of it like building your own personalised AI news service that runs itself.


Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   GitHub Actions (4x daily)              β”‚
β”‚   Cron: 5:00, 11:00, 17:00, 23:00 UTC                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ fetch_news.py │──▢│ summarise.py │──▢│generate_pageβ”‚ β”‚
β”‚  β”‚  RSS + dedup  β”‚   β”‚ GPT-4o mini  β”‚   β”‚  .py        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         β”‚                  β”‚                   β”‚        β”‚
β”‚    articles.json     summaries.json    latest/weekly/   β”‚
β”‚    broken_feeds.json                   monthly.json     β”‚
β”‚                                        feed.xml         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Clone main site repo β†’ copy data β†’ hugo build β†’ deploy β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              aguidetocloud.com/ai-news/                  β”‚
β”‚                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ainews.jsβ”‚  β”‚ainews.css    β”‚  β”‚ai-news/list.html   β”‚  β”‚
β”‚  β”‚ 615 loc β”‚  β”‚ 837 loc     β”‚  β”‚Hugo template        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                         β”‚
β”‚  Tabs: Today | This Month                               β”‚
β”‚  Tiers: πŸ”₯ Headlines | 🧠 Deep Dives | ⚑ Quick Links   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pipeline Scripts

1. fetch_news.py (~490 lines)

Feature Detail
Input 34 RSS feeds defined in scripts/feeds.json
Lookback 72 hours (HOURS_LOOKBACK = 72)
Dedup β€” exact md5(url + title) hash
Dedup β€” fuzzy Jaccard word overlap > 0.75 threshold
Image extraction RSS enclosures β†’ HTML content β†’ og:image backfill (threaded)
Keyword re-categorization Maps articles to correct categories when RSS source doesn't match
Output site/articles.json, site/broken_feeds.json

Zero-feed categories

DeepSeek, Perplexity, and Rumours have no dedicated RSS feeds β€” they rely entirely on keyword re-categorization from general feeds (e.g., Top Stories, Industry).

2. summarise.py (~211 lines)

Feature Detail
Model Azure OpenAI gpt-4o-mini (deployment name)
Endpoint ainews-openai.openai.azure.com
Auth OIDC β†’ Azure AD token (no API keys!)
Batch size 10 articles per API call
Temperature 0.2
Output per article summary, why_it_matters, tier, cluster
Tiers headline, deep_dive, quick
Fallback Single-item mode if batch parse fails
Cache-only mode If no token available, uses cached summaries only

3. generate_page.py (~523 lines)

Feature Detail
Daily archive site/archive/YYYY-MM-DD/index.html + JSON
Weekly digest site/weekly.json with trending topics
Monthly digest site/monthly.json
RSS feed site/feed.xml
Breaking news Cluster label + 3+ sources β†’ is_breaking = true, tier forced to headline
Breaking detection Fuzzy title word overlap > 0.5 for grouping

GitHub Actions Workflow

File: .github/workflows/nightly-news.yml (~145 lines)

Schedule: 4x daily (5:00, 11:00, 17:00, 23:00 UTC)
Triggers: cron + manual dispatch

Steps:
1. Checkout repo
2. Setup Python 3.12
3. Install requirements
4. Run fetch_news.py
5. Feed health check β†’ create GitHub Issue if feeds broken
6. Azure OIDC login (azure/login@v2)
7. Get Azure OpenAI token
8. Run summarise.py
9. Run generate_page.py
10. Commit site/ back to pipeline repo
11. Clone main site repo β†’ copy data β†’ hugo build β†’ swa deploy
12. On failure β†’ create GitHub Issue

OIDC Authentication

No stored secrets or API keys β€” the pipeline authenticates using OpenID Connect:

  1. GitHub Actions requests an OIDC token (id-token: write)
  2. Azure login via federated credential (azure/login@v2)
  3. Azure OpenAI token minted via az account get-access-token --resource https://cognitiveservices.azure.com
  4. Token passed to summarise.py as AZURE_OPENAI_TOKEN

Service principal: ainews app (client ID 51b70f10-db98-4c14-84ef-2ca661e51b6d) β€” shared with M365 Roadmap pipeline.

Feed Health Monitoring

  • After fetch, checks site/broken_feeds.json
  • If any feeds failed: creates a GitHub Issue with a markdown table of broken feeds
  • If all healthy: logs "All feeds healthy"
  • Issues are labeled bug for tracking

Frontend Implementation

Hugo Templates

Template Purpose
layouts/ai-news/list.html (54 lines) Main page β€” injects category config, renders tabs + RSS/CSV links
layouts/ai-news-category/list.html (64 lines) Category SEO pages β€” adds __ainewsCategoryFilter, CollectionPage JSON-LD

JavaScript β€” ainews.js (615 lines)

Feature How it works
Tab system Today + This Month tabs; loads latest.json or monthly.json per tab
3-tier layout Separates articles by tier β€” headlines get hero cards, deep dives get full cards
Search Debounced 200ms, matches against textContent, updates URL with ?q=
Category filtering 20 categories from window.__ainewsCategoryConfig (injected by Hugo from TOML)
Always-show categories M365 Copilot + Copilot Studio always visible in chip bar
sessionStorage cache Key: ainews_ + URL β€” caches fetched JSON to avoid re-downloading
Source favicons Google favicon service: google.com/s2/favicons?domain=...&sz=16
URL state Reads/writes ?cat= and ?q= for shareable filtered views
Category pages Respects window.__ainewsCategoryFilter to pre-filter content

CSS β€” ainews.css (837 lines)

Element Detail
Accent colour Magenta #ff66ff
Hero gradient color-mix(in srgb, #ff66ff 50%, #141425)
3-tier styling Headlines = cyan callouts, Deep dives = full cards, Quick links = compact rows
"Why it matters" Enlarged callout β€” 0.9rem text, 4px border, uppercase label, prominent background
Mobile @media (max-width: 768px) β€” stacked toolbar, hidden deep dive summaries, compact tabs
CSS namespace .ainews-* classes throughout

Category Configuration

Defined in data/ainews_categories.toml (124 lines) β€” injected into JS via Hugo template as window.__ainewsCategoryConfig.

20 categories: Top Stories, Microsoft, M365 Copilot, Copilot Studio, GitHub Copilot, AI Foundry, OpenAI, Anthropic, Google, Meta, DeepSeek, NVIDIA, Apple, Amazon, Mistral, xAI, Perplexity, Open Source, Industry, Rumours

Each category has: name, emoji, color, display order.


SEO Category Landing Pages

13 pages under content/ai-news/<slug>/_index.md:

Page Filter
/ai-news/microsoft/ Microsoft
/ai-news/m365-copilot/ M365 Copilot
/ai-news/copilot-studio/ Copilot Studio
/ai-news/github-copilot/ GitHub Copilot
/ai-news/ai-foundry/ AI Foundry
/ai-news/openai/ OpenAI
/ai-news/anthropic/ Anthropic
/ai-news/google/ Google
/ai-news/meta/ Meta
/ai-news/deepseek/ DeepSeek
/ai-news/apple/ Apple
/ai-news/amazon/ Amazon
/ai-news/nvidia/ NVIDIA

Each uses type: "ai-news-category" in front matter β€” Hugo ignores layout for _index.md section pages.


RSS Feed Sources (34 feeds across 17 categories)

Category Feed Count Max Articles
Top Stories 6 10
Microsoft 3 8
M365 Copilot 2 8
Copilot Studio 2 6
GitHub Copilot 2 6
AI Foundry 3 6
OpenAI 1 8
Google 2 8
Meta 2 6
Anthropic 2 6
Mistral 1 4
xAI 1 4
Apple 1 6
NVIDIA 1 6
Amazon 2 6
Open Source 1 6
Industry 2 8

Community proxy feeds

Anthropic, Mistral, and xAI use GitHub-hosted community proxy RSS feeds (no official RSS). Periodically verify these still work via feed health monitoring.


Data Flow Summary

RSS Feeds (34)
    ↓
fetch_news.py β†’ articles.json (raw articles, deduped)
    ↓
summarise.py β†’ summaries.json (GPT summaries + tiers)
    ↓
generate_page.py β†’ latest.json, weekly.json, monthly.json, feed.xml
    ↓
GitHub Actions β†’ copies to aguidetocloud-revamp/static/data/ainews/
    ↓
Hugo build β†’ Azure SWA deploy
    ↓
ainews.js β†’ loads JSON β†’ renders 3-tier TLDR layout

Key Design Decisions

Decision Rationale
Subfolder not subdomain /ai-news/ gets SEO juice from main domain vs standalone ainews.aguidetocloud.com
OIDC not API keys No rotating secrets β€” tokens are minted fresh each run
3-tier TLDR Inspired by TLDR AI newsletter (920K readers) β€” scan headlines, read deep dives, skim quick links
"Why it matters" The USP β€” readers love the human-readable context on each article
sessionStorage cache Prevents re-fetching same JSON when navigating between tabs/pages
Category TOML config Single source of truth for category names, emojis, colours β€” shared between Hugo + JS
4x daily updates Every 6 hours ensures fresh content throughout the day

Maintenance

Task How
Add a new RSS feed Edit scripts/feeds.json β€” add URL + category + optional max_articles
Add a new category Add to feeds.json + data/ainews_categories.toml + create SEO page under content/ai-news/
Check feed health Look at GitHub Issues in -ainews repo (auto-created by pipeline)
Force a refresh Trigger nightly-news.yml manually via GitHub Actions
Update OIDC Service principal: ainews app β†’ Federated credentials in Entra ID
Debug pipeline Check GitHub Actions logs β†’ fetch_news.py output β†’ summaries.json β†’ latest.json

Version History

Version Date What Changed
V1 Mar 28 Standalone ainews.aguidetocloud.com, basic HTML, nightly
V2 Apr 1–2 Moved to /ai-news/, 3-tier TLDR, OIDC auth, twice daily
V3 Apr 10 4x daily, 20 categories, 34 feeds, fuzzy dedup, breaking detection, feed health monitoring, 13 SEO pages, keyword re-categorization

Last updated: 11 April 2026