🏭 Foundry Local¶
Status: 🔴 Not yet installed What: Microsoft's on-device AI inference runtime — run AI models locally with hardware-optimised acceleration Website: foundrylocal.ai
☕ Café Analogy¶
If Ollama is a home coffee machine with a massive bean library, Foundry Local is Microsoft's Nespresso machine — fewer pod choices, but perfectly engineered by Microsoft to work with their beans (Phi models) and optimised to squeeze every drop of performance from your hardware (GPU, NPU, Apple Silicon).
What is Foundry Local?¶
Foundry Local is Microsoft's answer to Ollama. It runs AI models entirely on your device — no cloud, no API keys, no cost after download. It's the local sibling of Azure AI Foundry (the cloud version).
┌──────────────── Microsoft AI Foundry Family ────────────────┐
│ │
│ ☁️ Azure AI Foundry 🏠 Foundry Local │
│ ├── Cloud-hosted ├── Runs on YOUR device │
│ ├── Pay per token ├── Free forever │
│ ├── Massive model catalog ├── Curated model catalog │
│ ├── Scale to millions ├── Single-user, private │
│ └── Needs Azure subscription └── No subscription needed │
│ │
└──────────────────────────────────────────────────────────────┘
Foundry Local vs Ollama¶
| Feature | Foundry Local | Ollama |
|---|---|---|
| Made by | Microsoft | Community (open source) |
| Install | winget install Microsoft.FoundryLocal |
winget install Ollama.Ollama |
| CLI | foundry model run phi-4-mini |
ollama run phi4-mini |
| API port | localhost:5272 |
localhost:11434 |
| API format | OpenAI-compatible ✅ | OpenAI-compatible ✅ |
| NPU support | ✅ Intel, Qualcomm, Apple | ❌ GPU/CPU only |
| Model catalog | Smaller, Microsoft-curated | Massive community library |
| Audio (Whisper) | ✅ Built-in | Via community models |
| Status | 🔶 Preview | ✅ Stable |
| Works with Copilot CLI | ✅ via BYOK | ✅ via BYOK |
| Unique strength | NPU acceleration, Microsoft-optimised | Largest ecosystem |
Installation¶
# Install
winget install Microsoft.FoundryLocal
# Verify
foundry --version
# See available models
foundry model list
# Run a model
foundry model run phi-4-mini
# Start as API server
foundry service start
# Check service status
foundry service status
# Manage cache
foundry cache list # See downloaded models
foundry cache location # Where models are stored
foundry cache remove <model> # Delete a model
Architecture¶
┌─────────────────────────────────┐
│ Your Laptop │
│ │
│ ┌──────────────────────────┐ │
│ │ Foundry Local Service │ │
│ │ (localhost:5272) │ │
│ │ │ │
│ │ ┌────────────────────┐ │ │
│ │ │ ONNX Runtime │ │ │
│ │ │ Auto-selects: │ │ │
│ │ │ • NVIDIA CUDA │ │ │
│ │ │ • AMD GPU │ │ │
│ │ │ • Intel NPU ←── │ │ │
│ │ │ • Qualcomm NPU │ │ │ Your Snapdragon X Elite
│ │ │ • Apple Silicon │ │ │ has a Qualcomm NPU! 🚀
│ │ │ • CPU fallback │ │ │
│ │ └────────────────────┘ │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ Local Model Cache │ │
│ │ ~/.foundry/cache/ │ │
│ └──────────────────────────┘ │
└─────────────────────────────────┘
Connect to Copilot CLI (BYOK)¶
# Start Foundry Local service
foundry service start
# Point Copilot CLI at Foundry Local
$env:COPILOT_PROVIDER_BASE_URL = "http://localhost:5272/v1"
$env:COPILOT_MODEL = "phi-4-mini"
$env:COPILOT_OFFLINE = "true"
copilot
This gives you 100% Microsoft, 100% offline, zero-cost agentic coding.
Requirements¶
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 x64 | Windows 11 x64/ARM |
| RAM | 8 GB | 16 GB+ |
| Disk | 3 GB free | 15 GB+ |
| GPU/NPU | None (CPU works) | NVIDIA RTX 2000+, Qualcomm NPU, Apple Silicon |
Planned Exercises¶
- [ ] Install Foundry Local (L61)
- [ ] Run Phi-4-mini and compare quality to Ollama version
- [ ] Test NPU acceleration on Snapdragon X Elite
- [ ] Connect to Copilot CLI via BYOK (L60)
- [ ] Try Whisper model for audio transcription
- [ ] Compare inference speed: Foundry Local vs Ollama vs Cloud