Skip to content

🏭 Foundry Local

Status: 🔴 Not yet installed What: Microsoft's on-device AI inference runtime — run AI models locally with hardware-optimised acceleration Website: foundrylocal.ai

☕ Café Analogy

If Ollama is a home coffee machine with a massive bean library, Foundry Local is Microsoft's Nespresso machine — fewer pod choices, but perfectly engineered by Microsoft to work with their beans (Phi models) and optimised to squeeze every drop of performance from your hardware (GPU, NPU, Apple Silicon).

What is Foundry Local?

Foundry Local is Microsoft's answer to Ollama. It runs AI models entirely on your device — no cloud, no API keys, no cost after download. It's the local sibling of Azure AI Foundry (the cloud version).

┌──────────────── Microsoft AI Foundry Family ────────────────┐
│                                                              │
│  ☁️ Azure AI Foundry          🏠 Foundry Local               │
│  ├── Cloud-hosted             ├── Runs on YOUR device        │
│  ├── Pay per token            ├── Free forever               │
│  ├── Massive model catalog    ├── Curated model catalog      │
│  ├── Scale to millions        ├── Single-user, private       │
│  └── Needs Azure subscription └── No subscription needed     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Foundry Local vs Ollama

Feature Foundry Local Ollama
Made by Microsoft Community (open source)
Install winget install Microsoft.FoundryLocal winget install Ollama.Ollama
CLI foundry model run phi-4-mini ollama run phi4-mini
API port localhost:5272 localhost:11434
API format OpenAI-compatible ✅ OpenAI-compatible ✅
NPU support ✅ Intel, Qualcomm, Apple ❌ GPU/CPU only
Model catalog Smaller, Microsoft-curated Massive community library
Audio (Whisper) ✅ Built-in Via community models
Status 🔶 Preview ✅ Stable
Works with Copilot CLI ✅ via BYOK ✅ via BYOK
Unique strength NPU acceleration, Microsoft-optimised Largest ecosystem

Installation

# Install
winget install Microsoft.FoundryLocal

# Verify
foundry --version

# See available models
foundry model list

# Run a model
foundry model run phi-4-mini

# Start as API server
foundry service start

# Check service status
foundry service status

# Manage cache
foundry cache list          # See downloaded models
foundry cache location      # Where models are stored
foundry cache remove <model> # Delete a model

Architecture

┌─────────────────────────────────┐
│          Your Laptop             │
│                                  │
│  ┌──────────────────────────┐    │
│  │    Foundry Local Service  │    │
│  │    (localhost:5272)       │    │
│  │                          │    │
│  │  ┌────────────────────┐  │    │
│  │  │    ONNX Runtime     │  │    │
│  │  │  Auto-selects:     │  │    │
│  │  │  • NVIDIA CUDA     │  │    │
│  │  │  • AMD GPU         │  │    │
│  │  │  • Intel NPU  ←── │  │    │
│  │  │  • Qualcomm NPU   │  │    │  Your Snapdragon X Elite
│  │  │  • Apple Silicon   │  │    │  has a Qualcomm NPU! 🚀
│  │  │  • CPU fallback    │  │    │
│  │  └────────────────────┘  │    │
│  └──────────────────────────┘    │
│                                  │
│  ┌──────────────────────────┐    │
│  │   Local Model Cache       │    │
│  │   ~/.foundry/cache/       │    │
│  └──────────────────────────┘    │
└─────────────────────────────────┘

Connect to Copilot CLI (BYOK)

# Start Foundry Local service
foundry service start

# Point Copilot CLI at Foundry Local
$env:COPILOT_PROVIDER_BASE_URL = "http://localhost:5272/v1"
$env:COPILOT_MODEL = "phi-4-mini"
$env:COPILOT_OFFLINE = "true"
copilot

This gives you 100% Microsoft, 100% offline, zero-cost agentic coding.

Requirements

Requirement Minimum Recommended
OS Windows 10 x64 Windows 11 x64/ARM
RAM 8 GB 16 GB+
Disk 3 GB free 15 GB+
GPU/NPU None (CPU works) NVIDIA RTX 2000+, Qualcomm NPU, Apple Silicon

Planned Exercises

  • [ ] Install Foundry Local (L61)
  • [ ] Run Phi-4-mini and compare quality to Ollama version
  • [ ] Test NPU acceleration on Snapdragon X Elite
  • [ ] Connect to Copilot CLI via BYOK (L60)
  • [ ] Try Whisper model for audio transcription
  • [ ] Compare inference speed: Foundry Local vs Ollama vs Cloud