π° AI News Pipeline¶
Live at: aguidetocloud.com/ai-news/ Repo:
susanthgit/-ainews(V3) Built: March 28, 2026 (V1) β April 10, 2026 (V3) Cost: $0/month (GitHub Actions Free + Azure SWA Free + Azure OpenAI ~$0.02/day)
What It Does¶
An automated AI news aggregation system that collects articles from 34 RSS feeds across 20 categories, summarises them with GPT-4o mini, and displays them in a 3-tier TLDR layout β updated 4 times daily.
Think of it like building your own personalised AI news service that runs itself.
Architecture Overview¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub Actions (4x daily) β
β Cron: 5:00, 11:00, 17:00, 23:00 UTC β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββ β
β β fetch_news.py ββββΆβ summarise.py ββββΆβgenerate_pageβ β
β β RSS + dedup β β GPT-4o mini β β .py β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββ β
β β β β β
β articles.json summaries.json latest/weekly/ β
β broken_feeds.json monthly.json β
β feed.xml β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Clone main site repo β copy data β hugo build β deploy β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β aguidetocloud.com/ai-news/ β
β β
β βββββββββββ βββββββββββββββ ββββββββββββββββββββββ β
β βainews.jsβ βainews.css β βai-news/list.html β β
β β 615 loc β β 837 loc β βHugo template β β
β βββββββββββ βββββββββββββββ ββββββββββββββββββββββ β
β β
β Tabs: Today | This Month β
β Tiers: π₯ Headlines | π§ Deep Dives | β‘ Quick Links β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Pipeline Scripts¶
1. fetch_news.py (~490 lines)¶
| Feature | Detail |
|---|---|
| Input | 34 RSS feeds defined in scripts/feeds.json |
| Lookback | 72 hours (HOURS_LOOKBACK = 72) |
| Dedup β exact | md5(url + title) hash |
| Dedup β fuzzy | Jaccard word overlap > 0.75 threshold |
| Image extraction | RSS enclosures β HTML content β og:image backfill (threaded) |
| Keyword re-categorization | Maps articles to correct categories when RSS source doesn't match |
| Output | site/articles.json, site/broken_feeds.json |
Zero-feed categories
DeepSeek, Perplexity, and Rumours have no dedicated RSS feeds β they rely entirely on keyword re-categorization from general feeds (e.g., Top Stories, Industry).
2. summarise.py (~211 lines)¶
| Feature | Detail |
|---|---|
| Model | Azure OpenAI gpt-4o-mini (deployment name) |
| Endpoint | ainews-openai.openai.azure.com |
| Auth | OIDC β Azure AD token (no API keys!) |
| Batch size | 10 articles per API call |
| Temperature | 0.2 |
| Output per article | summary, why_it_matters, tier, cluster |
| Tiers | headline, deep_dive, quick |
| Fallback | Single-item mode if batch parse fails |
| Cache-only mode | If no token available, uses cached summaries only |
3. generate_page.py (~523 lines)¶
| Feature | Detail |
|---|---|
| Daily archive | site/archive/YYYY-MM-DD/index.html + JSON |
| Weekly digest | site/weekly.json with trending topics |
| Monthly digest | site/monthly.json |
| RSS feed | site/feed.xml |
| Breaking news | Cluster label + 3+ sources β is_breaking = true, tier forced to headline |
| Breaking detection | Fuzzy title word overlap > 0.5 for grouping |
GitHub Actions Workflow¶
File: .github/workflows/nightly-news.yml (~145 lines)
Schedule: 4x daily (5:00, 11:00, 17:00, 23:00 UTC)
Triggers: cron + manual dispatch
Steps:
1. Checkout repo
2. Setup Python 3.12
3. Install requirements
4. Run fetch_news.py
5. Feed health check β create GitHub Issue if feeds broken
6. Azure OIDC login (azure/login@v2)
7. Get Azure OpenAI token
8. Run summarise.py
9. Run generate_page.py
10. Commit site/ back to pipeline repo
11. Clone main site repo β copy data β hugo build β swa deploy
12. On failure β create GitHub Issue
OIDC Authentication¶
No stored secrets or API keys β the pipeline authenticates using OpenID Connect:
- GitHub Actions requests an OIDC token (
id-token: write) - Azure login via federated credential (
azure/login@v2) - Azure OpenAI token minted via
az account get-access-token --resource https://cognitiveservices.azure.com - Token passed to
summarise.pyasAZURE_OPENAI_TOKEN
Service principal: ainews app (client ID 51b70f10-db98-4c14-84ef-2ca661e51b6d) β shared with M365 Roadmap pipeline.
Feed Health Monitoring¶
- After fetch, checks
site/broken_feeds.json - If any feeds failed: creates a GitHub Issue with a markdown table of broken feeds
- If all healthy: logs "All feeds healthy"
- Issues are labeled
bugfor tracking
Frontend Implementation¶
Hugo Templates¶
| Template | Purpose |
|---|---|
layouts/ai-news/list.html (54 lines) |
Main page β injects category config, renders tabs + RSS/CSV links |
layouts/ai-news-category/list.html (64 lines) |
Category SEO pages β adds __ainewsCategoryFilter, CollectionPage JSON-LD |
JavaScript β ainews.js (615 lines)¶
| Feature | How it works |
|---|---|
| Tab system | Today + This Month tabs; loads latest.json or monthly.json per tab |
| 3-tier layout | Separates articles by tier β headlines get hero cards, deep dives get full cards |
| Search | Debounced 200ms, matches against textContent, updates URL with ?q= |
| Category filtering | 20 categories from window.__ainewsCategoryConfig (injected by Hugo from TOML) |
| Always-show categories | M365 Copilot + Copilot Studio always visible in chip bar |
| sessionStorage cache | Key: ainews_ + URL β caches fetched JSON to avoid re-downloading |
| Source favicons | Google favicon service: google.com/s2/favicons?domain=...&sz=16 |
| URL state | Reads/writes ?cat= and ?q= for shareable filtered views |
| Category pages | Respects window.__ainewsCategoryFilter to pre-filter content |
CSS β ainews.css (837 lines)¶
| Element | Detail |
|---|---|
| Accent colour | Magenta #ff66ff |
| Hero gradient | color-mix(in srgb, #ff66ff 50%, #141425) |
| 3-tier styling | Headlines = cyan callouts, Deep dives = full cards, Quick links = compact rows |
| "Why it matters" | Enlarged callout β 0.9rem text, 4px border, uppercase label, prominent background |
| Mobile | @media (max-width: 768px) β stacked toolbar, hidden deep dive summaries, compact tabs |
| CSS namespace | .ainews-* classes throughout |
Category Configuration¶
Defined in data/ainews_categories.toml (124 lines) β injected into JS via Hugo template as window.__ainewsCategoryConfig.
20 categories: Top Stories, Microsoft, M365 Copilot, Copilot Studio, GitHub Copilot, AI Foundry, OpenAI, Anthropic, Google, Meta, DeepSeek, NVIDIA, Apple, Amazon, Mistral, xAI, Perplexity, Open Source, Industry, Rumours
Each category has: name, emoji, color, display order.
SEO Category Landing Pages¶
13 pages under content/ai-news/<slug>/_index.md:
| Page | Filter |
|---|---|
/ai-news/microsoft/ |
Microsoft |
/ai-news/m365-copilot/ |
M365 Copilot |
/ai-news/copilot-studio/ |
Copilot Studio |
/ai-news/github-copilot/ |
GitHub Copilot |
/ai-news/ai-foundry/ |
AI Foundry |
/ai-news/openai/ |
OpenAI |
/ai-news/anthropic/ |
Anthropic |
/ai-news/google/ |
|
/ai-news/meta/ |
Meta |
/ai-news/deepseek/ |
DeepSeek |
/ai-news/apple/ |
Apple |
/ai-news/amazon/ |
Amazon |
/ai-news/nvidia/ |
NVIDIA |
Each uses type: "ai-news-category" in front matter β Hugo ignores layout for _index.md section pages.
RSS Feed Sources (34 feeds across 17 categories)¶
| Category | Feed Count | Max Articles |
|---|---|---|
| Top Stories | 6 | 10 |
| Microsoft | 3 | 8 |
| M365 Copilot | 2 | 8 |
| Copilot Studio | 2 | 6 |
| GitHub Copilot | 2 | 6 |
| AI Foundry | 3 | 6 |
| OpenAI | 1 | 8 |
| 2 | 8 | |
| Meta | 2 | 6 |
| Anthropic | 2 | 6 |
| Mistral | 1 | 4 |
| xAI | 1 | 4 |
| Apple | 1 | 6 |
| NVIDIA | 1 | 6 |
| Amazon | 2 | 6 |
| Open Source | 1 | 6 |
| Industry | 2 | 8 |
Community proxy feeds
Anthropic, Mistral, and xAI use GitHub-hosted community proxy RSS feeds (no official RSS). Periodically verify these still work via feed health monitoring.
Data Flow Summary¶
RSS Feeds (34)
β
fetch_news.py β articles.json (raw articles, deduped)
β
summarise.py β summaries.json (GPT summaries + tiers)
β
generate_page.py β latest.json, weekly.json, monthly.json, feed.xml
β
GitHub Actions β copies to aguidetocloud-revamp/static/data/ainews/
β
Hugo build β Azure SWA deploy
β
ainews.js β loads JSON β renders 3-tier TLDR layout
Key Design Decisions¶
| Decision | Rationale |
|---|---|
| Subfolder not subdomain | /ai-news/ gets SEO juice from main domain vs standalone ainews.aguidetocloud.com |
| OIDC not API keys | No rotating secrets β tokens are minted fresh each run |
| 3-tier TLDR | Inspired by TLDR AI newsletter (920K readers) β scan headlines, read deep dives, skim quick links |
| "Why it matters" | The USP β readers love the human-readable context on each article |
| sessionStorage cache | Prevents re-fetching same JSON when navigating between tabs/pages |
| Category TOML config | Single source of truth for category names, emojis, colours β shared between Hugo + JS |
| 4x daily updates | Every 6 hours ensures fresh content throughout the day |
Maintenance¶
| Task | How |
|---|---|
| Add a new RSS feed | Edit scripts/feeds.json β add URL + category + optional max_articles |
| Add a new category | Add to feeds.json + data/ainews_categories.toml + create SEO page under content/ai-news/ |
| Check feed health | Look at GitHub Issues in -ainews repo (auto-created by pipeline) |
| Force a refresh | Trigger nightly-news.yml manually via GitHub Actions |
| Update OIDC | Service principal: ainews app β Federated credentials in Entra ID |
| Debug pipeline | Check GitHub Actions logs β fetch_news.py output β summaries.json β latest.json |
Version History¶
| Version | Date | What Changed |
|---|---|---|
| V1 | Mar 28 | Standalone ainews.aguidetocloud.com, basic HTML, nightly |
| V2 | Apr 1β2 | Moved to /ai-news/, 3-tier TLDR, OIDC auth, twice daily |
| V3 | Apr 10 | 4x daily, 20 categories, 34 feeds, fuzzy dedup, breaking detection, feed health monitoring, 13 SEO pages, keyword re-categorization |
Last updated: 11 April 2026