🧠 Models & Context Window¶
The Big Picture: Every time you chat with Copilot CLI, you're talking to an AI model — a specific "brain" — through a context window — a shared "whiteboard." Understanding these two concepts will help you get better results and spend your budget wisely.
What is a Model?¶
Think of each model as a different chef in a restaurant kitchen. They all cook food, but each has different strengths, speeds, and price tags.
When you open the /model menu, you're choosing which chef handles your order.
| Model | Provider | Personality | Context | Cost | Best For |
|---|---|---|---|---|---|
| Claude Sonnet 4.5 | Anthropic | 🧑🍳 Reliable head chef | 200k | 1x | Balanced everyday work |
| Claude Opus 4.6 | Anthropic | 👨🍳 Michelin-star chef | 200k | 1x | Hardest problems, best quality |
| Claude Haiku 4.5 | Anthropic | 🍔 Fast-food cook | 200k | 0.33x | Quick questions, saving budget |
| Claude Opus 4.6 1M | Anthropic | 👨🍳 Same Michelin chef, massive kitchen | 1,000k | 6x | Huge codebases (internal only) |
| GPT-5.1 | OpenAI | 🍝 Chef from a different restaurant chain | 200k | 1x | Different perspective |
| GPT-5.4 mini | OpenAI | 🥡 Their fast option | 200k | 0.33x | Quick tasks, budget-friendly |
Bottom Line
You don't need to memorise this table. Just remember: Opus 4.6 = smartest at standard cost, Haiku = cheapest, and 1M = huge but expensive.
The /model Menu Explained¶
Type /model and you'll see something like this:
┌─────────────────────────────────┬─────────┬──────┐
│ Model │ Context │ Cost │
├─────────────────────────────────┼─────────┼──────┤
│ Claude Opus 4.6 │ 200k │ 1x │
│ Claude Sonnet 4.5 │ 200k │ 1x │
│ Claude Haiku 4.5 │ 200k │ 0.33x│
│ Claude Opus 4.6 (1M context) │ 1000k │ 6x │
│ GPT-5.1 │ 200k │ 1x │
│ GPT-5.4 mini │ 200k │ 0.33x│
└─────────────────────────────────┴─────────┴──────┘
Two columns matter: Context and Cost. Let's break each one down.
Context Column (42k, 200k, 1000k)¶
This is the size of the whiteboard the AI uses to hold your entire conversation.
- 200k tokens ≈ 150,000 words ≈ roughly 2–3 full novels
- 1,000k tokens ≈ 750,000 words ≈ roughly 10–15 novels
What does 'k' mean?
The "k" stands for thousand. So 200k = 200,000 tokens. Think of it like kilometres — 200k is 200,000 of something.
The bigger the context, the more information the AI can "see" at once — your messages, files you've shared, its own instructions, and more.
Cost Column (1x, 0.33x, 6x)¶
This is how many premium requests each message costs from your monthly budget.
Think of it like a café budget:
The Café Budget Analogy ☕
Imagine you get $100/month to spend at the AI café.
| Model | Cost per "Coffee" | Coffees You Get |
|---|---|---|
| Haiku 4.5 / GPT-5.4 mini | $0.33 each | ~300 coffees ☕☕☕ |
| Sonnet 4.5 / Opus 4.6 / GPT-5.1 | $1.00 each | 100 coffees ☕ |
| Opus 4.6 1M | $6.00 each | ~16 coffees ☕ |
The cheap coffee is still good — it's just smaller and simpler. The $6 coffee is the same quality as the $1 coffee, but served on a massive table (1M context).
Which Model Should You Use?¶
Here's a simple decision guide:
What are you doing?
│
├── 💬 Quick question or simple task
│ └── → Haiku 4.5 (0.33x) — save your budget
│
├── 📚 Learning / daily work / writing
│ └── → Opus 4.6 (1x) — best brain at standard cost
│
├── 🏗️ Massive project with huge files
│ └── → Opus 4.6 1M (6x) — only when you truly need the space
│
└── 🔄 Want a different perspective or style
└── → GPT-5.1 (1x) — different "restaurant," different approach
Default recommendation
Start with Opus 4.6. It gives you the best quality at standard cost (1x). Switch to Haiku when you're doing simple things and want to stretch your budget.
What Are Tokens?¶
Tokens are the unit of measurement for the whiteboard. Everything — your messages, the AI's replies, files, instructions — gets converted into tokens.
Common misconception
One word ≠ one token. It's not a 1-to-1 relationship!
Rule of thumb: 1 token ≈ ¾ of a word (~4 characters)
Think of tokens like LEGO bricks:
- Short, common words (like "the", "hello") = 1 brick
- Longer or unusual words (like "PowerShell") = 2–3 bricks
- Technical strings get broken into many bricks
| Example Text | Approximate Tokens |
|---|---|
Hello |
1 token |
Good morning |
2 tokens |
PowerShell |
2–3 tokens |
New-AzResourceGroup |
5–7 tokens |
M365CPI52224224.onmicrosoft.com |
~10 tokens |
| A full page of text (~400 words) | ~500–800 tokens |
| An entire novel (~80,000 words) | ~100,000 tokens |
Why does this matter?
Because the context window is measured in tokens. When you paste a long error message or a big file, it might eat up more whiteboard space than you'd expect — every character costs tokens.
The Whiteboard (Context Window)¶
The context window is a shared whiteboard between you and the AI. Everything — your questions, the AI's answers, files, tool definitions — lives on this whiteboard.
Here's what it looks like as it fills up:
╔══════════════════════════════════════════════════╗
║ THE WHITEBOARD (200k) ║
╠══════════════════════════════════════════════════╣
║ ║
║ ██████████████████████░░░░░░░░░░░░░░░░░░░░░░░░ ║
║ ▲ Used: ~35% Free: ~60% ▲ ║
║ │ │ ║
║ System/Tools Buffer ║
║ (loaded before (5%) ║
║ you say hello!) ║
║ ║
║ Stage 1: 70% used — Everything is fine 😊 ║
║ ████████████████████████████████░░░░░░░░░░░░░░ ║
║ ║
║ Stage 2: 85% used — Warning appears ⚠️ ║
║ ██████████████████████████████████████░░░░░░░░ ║
║ ║
║ Stage 3: 95% used — Auto-compact kicks in 🔄 ║
║ █████████████████████████████████████████████░░ ║
║ ║
║ Stage 4: 100% full — Can't process! 🛑 ║
║ ████████████████████████████████████████████████ ║
║ ║
╚══════════════════════════════════════════════════╝
What's ON the Whiteboard?¶
Run /context and you'll see how the whiteboard is divided. Here's what each section means:
| Section | Typical Size | What It Is | Analogy |
|---|---|---|---|
| System / Tools | ~30–40% | My instructions, all available tools (Azure, GitHub, MCP servers, skills) | The café's operating manual — menu, rules, procedures — always pinned to the whiteboard |
| Messages | Grows over time | Our entire conversation history — every question and answer | The order history — every coffee you've ordered today |
| Free space | Shrinks over time | Room for more conversation | Empty whiteboard — space for new orders |
| Buffer | ~5% | Safety margin so I don't crash mid-response | Reserved space — like keeping the last page of a notebook blank |
Surprising Fact
System/Tools takes up 30–40% before you even say hello! That's like walking into a café and finding the whiteboard already half-full with the menu, health regulations, and staff procedures — before the first customer arrives.
This is why your 200k context window doesn't really give you 200k of conversation space. You effectively start with ~120–140k of usable space.
Traffic Light System¶
Use this to decide when to take action:
🟢 GREEN (0–60% used) Everything is fine. Keep going.
🟡 YELLOW (60–80% used) Consider /compact or /new soon.
🔴 RED (80%+ used) Act NOW — /compact, save work, or /clear.
When you're in the RED zone
Don't ignore it. At 95%+, the system will try to auto-compact (summarise and shrink the conversation), but this can lose important details. It's better to manage it yourself before it gets critical.
What to do:
- Run
/compactto summarise the conversation (frees up space) - Save any important decisions to your instructions file
- If needed, run
/clearor/newto start fresh
Managing Your Whiteboard¶
Five practical strategies to keep your whiteboard healthy:
| # | Strategy | What to Do | Why It Helps |
|---|---|---|---|
| 1 | Be selective with @ |
Don't @ entire folders or huge files unless you need them |
Every file you reference eats whiteboard space |
| 2 | Start new sessions for new topics | Use /new when switching to a completely different task |
A fresh whiteboard = maximum space |
| 3 | Use /compact proactively |
Run /compact when you hit the yellow zone (60–80%) |
Summarises conversation, frees up tokens |
| 4 | Keep thinking OFF for simple tasks | Toggle with Ctrl+T for simple questions |
Extended thinking uses extra tokens |
| 5 | Save to instructions, then /clear |
Put important context in .github/copilot-instructions.md, then clear |
The "passport strategy" — your context survives session resets because instructions are reloaded automatically |
The Passport Strategy 🛂
Think of your instructions file as a passport. When you /clear a session, the whiteboard is wiped clean — but your passport (instructions file) is always reloaded. So anything saved there survives across sessions.
This is perfect for things like:
- Your tenant IDs and environment details
- Preferred commands or workflows
- Project-specific context you always need
Model Switching Mid-Conversation¶
You can switch models at any time using /model. Here's what happens:
What carries over (stays on the whiteboard):
- ✅ Your entire conversation history
- ✅ Files that were read into context
- ✅ Decisions and plans already discussed
What changes (the new chef takes over):
- 🔄 The "brain" processing everything
- 🔄 Quality and style of responses
- 🔄 How the AI interprets nuance
It's like switching chefs mid-meal
The new chef reads all the previous chef's notes (your conversation), but they might interpret the recipe differently. The kitchen (whiteboard) stays the same — only the person cooking changes.
Best Practices for Model Switching¶
| Do | Don't |
|---|---|
| ✅ Start with Opus 4.6 for best quality at 1x cost | ❌ Don't switch models mid-complex-task — it can cause confusion |
| ✅ Switch to Haiku for simple follow-ups to save budget | ❌ Don't assume the new model "knows" implicit context |
✅ Run /compact before switching (clean whiteboard for the new brain) |
❌ Don't switch repeatedly back and forth |
| ✅ Brief the new model after switching ("I'm working on X, we decided Y") | ❌ Don't use 1M model for short conversations |
The "Internal Only" 1M Model¶
The Claude Opus 4.6 (1M context) model is special:
- Same brain as regular Opus 4.6 — same intelligence, same quality
- Bigger whiteboard — 1,000,000 tokens instead of 200,000 (5× larger)
- 6× the cost — each message costs 6 premium requests
- Available to Microsoft employees only (internal access)
Don't use it all the time
Using the 1M model for everyday questions is like renting a football stadium to cook dinner for two. You're paying 6× the price for space you'll never fill.
When to actually use 1M:
- You need to analyse a massive codebase (many large files at once)
- You're working with very large documents (100+ page specs)
- Your session has been going so long that 200k isn't enough
/compactcan't free up enough space for what you need
Smart Approach
Start with regular Opus 4.6 (1x cost)
│
├── Session getting long? → Run /compact
│
├── /compact isn't enough? → Try /new with key context
│
└── Still need more space? → NOW switch to 1M (6x cost)
This way, you only pay the 6× premium when you genuinely need the extra space.
Thinking Toggle¶
The thinking toggle controls whether you can see the AI's internal reasoning process.
| Setting | What Happens | When to Use |
|---|---|---|
| Thinking ON | You see the AI's step-by-step reasoning before the answer | Complex decisions, debugging, understanding why |
| Thinking OFF | You just see the final answer | Simple questions, saving tokens |
Toggle it with: Ctrl+T
Windows Terminal Conflict
In Windows Terminal, Ctrl+T opens a new tab instead of toggling thinking. This is a known conflict.
Workaround: Use a different terminal emulator, or check if your terminal lets you rebind the Ctrl+T shortcut.
Extended thinking is a separate feature — the model automatically "thinks harder" on complex problems, spending more tokens on reasoning before answering. You don't control this directly; it happens when the model detects a hard problem.
Quick Decision Flowchart¶
When in doubt, use this:
Need Copilot CLI help?
│
├── 💬 Quick question or lookup
│ └── → Haiku 4.5 (0.33x) 💰 Save budget
│
├── 📚 Learning / daily work / writing code
│ └── → Opus 4.6 (1x) ⭐ Best brain, standard price
│
├── 🏗️ Massive project, huge files
│ └── → Opus 4.6 1M (6x) 🏟️ Only when truly needed
│
└── 🔄 Want a different style or approach
└── → GPT-5.1 (1x) 🍝 Different restaurant
Check Your Budget
Run /usage at any time to see how many premium requests you have left this month. Plan accordingly!
Plans & Premium Request Budgets¶
Now that you understand models and their cost multipliers, let's talk about the real money — how many premium requests you get per month and what happens when you run out.
The Plans¶
Think of this like mobile phone plans — each tier gives you a different amount of "data" (premium requests), and once it's used up, you drop to a slower speed.
| Plan | Monthly Price | Premium Requests | Best For |
|---|---|---|---|
| Free | $0 | 50 | Trying it out, casual use |
| Pro | $10/month | 300 | Light personal use |
| Pro+ | $39/month | 1,500 | Heavy personal use (power users) |
| Business | $19/user/month | 300/user | Teams & organisations |
| Enterprise | $39/user/month | 1,000/user | Large organisations with governance needs |
Which plan for which user?
- 50 requests (Free) → You'll burn through this in a single learning session
- 300 requests (Pro) → Fine for light daily use, tight for heavy sessions
- 1,500 requests (Pro+) → A solid amount, but model multipliers can drain it fast (see below)
What Happens When You Hit the Limit?¶
This is the critical question. It's a hard limit with a graceful fallback:
graph TD
A([🎯 You send a message]) --> B{Premium requests remaining?}
B -->|✅ Yes| C([Uses premium model you selected])
B -->|❌ No, limit reached| D{Pay-per-use enabled?}
D -->|Yes, budget set| E([Continues with premium model<br/>Charges overage ~$0.04/request × multiplier])
D -->|No budget set| F([Falls back to GPT-4.1 / GPT-4o<br/>Unlimited but less capable])
style C fill:#1a5e2a,stroke:#66ff66,color:#fff
style E fill:#5e4a1a,stroke:#ffcc66,color:#fff
style F fill:#5e1a1a,stroke:#ff6666,color:#fff
| Scenario | What Happens |
|---|---|
| Within limit | ✅ Everything works normally with your chosen premium model |
| Hit the limit (no budget) | ⚠️ Falls back to GPT-4.1/GPT-4o — unlimited but less capable. Copilot CLI still works, just on a weaker model |
| Hit the limit (budget enabled) | 💰 Continues using premium models, charges ~$0.04/request × model multiplier |
| Quota reset | 🔄 1st of every month at midnight UTC (1 PM NZDT) — counter goes back to zero |
Copilot CLI doesn't stop working!
This is the key takeaway — when you exhaust your premium requests, Copilot does not shut down. It drops you to the free base model (GPT-4.1/GPT-4o). You lose access to Claude, GPT-4.5, and other premium models until the next billing cycle — unless you enable overage billing.
Model Multipliers — The Hidden Cost¶
Not every message costs just "1" premium request. Different models have different multipliers — like ordering the most expensive coffee uses more of your café budget.
| Model | Multiplier | Requests Used Per Message | With Pro+ (1,500), You Get |
|---|---|---|---|
| GPT-4.1 / GPT-4o | 0× | 0 (unlimited, base) | ∞ unlimited |
| Claude Haiku 4.5 / GPT-5.4 mini | 0.33× | 0.33 | ~4,545 messages |
| Claude Sonnet 4 / GPT-5.1 | 1× | 1 | 1,500 messages |
| Claude Opus 4.6 | 1× | 1 | 1,500 messages |
| Claude Opus 4.6 (1M) | 6× | 6 | ~250 messages |
| GPT-4.5 | ~50× | 50 | ~30 messages 😱 |
Watch Out for GPT-4.5!
A single message using GPT-4.5 costs 50 premium requests. That means 30 messages would use your entire Pro+ budget for the month. Always check the multiplier before using expensive models.
Real-World Budget Example ☕
Let's say you're on Pro+ (1,500 requests/month) and you use Copilot CLI daily:
| Usage Pattern | Daily Messages | Monthly Total | Verdict |
|---|---|---|---|
| All Haiku (0.33×) | 75 | ~750 requests | ✅ Plenty of budget left |
| All Opus 4.6 (1×) | 75 | ~2,250 requests | ⚠️ Over budget by week 3 |
| Mix: 50 Haiku + 25 Opus | 75 | ~1,025 requests | ✅ Fits within budget |
| 10 GPT-4.5 messages | 10 | 500 requests | 😱 Third of budget gone in one session |
How to Monitor Your Usage¶
Three ways to keep an eye on your premium request balance:
| Method | How to Access | What You See |
|---|---|---|
| GitHub Billing Dashboard | github.com/settings/billing | Full analytics: usage by product, model, cost breakdown, trends |
| In IDE (VS Code etc.) | Click Copilot icon in status bar | Used / remaining / reset date |
| In Copilot CLI | Type /usage |
Remaining premium requests and reset date |
Check your billing dashboard weekly
The GitHub Billing page has a Premium Request Analytics dashboard (GA since late 2025) that shows detailed breakdowns by model, product (Chat, CLI, Agent, etc.), and time period. This is the most accurate way to track your real consumption.
Budget Tips — Stretching Your Premium Requests¶
Here are practical strategies to make your monthly budget last:
| # | Strategy | Impact | How |
|---|---|---|---|
| 1 | Use Haiku for simple tasks | Save ~67% per message | Switch to Haiku via /model for quick questions, lookups, simple edits |
| 2 | Batch your questions | Fewer turns = fewer requests | Ask everything in one message instead of 5 separate questions |
| 3 | Enable pay-per-use as safety net | Never lose access unexpectedly | Set a small budget ($5–10) at github.com/settings/billing |
| 4 | Monitor weekly | Catch overspending early | Check /usage or the billing dashboard every Monday |
| 5 | Avoid GPT-4.5 | Save 50× per message | Use Opus 4.6 (1×) instead — similar quality, 50× cheaper |
| 6 | Use /compact regularly |
Indirect savings | Shorter context = less token processing per message |
The Sweet Spot for Pro+
With 1,500 requests on Pro+ and a mix of Haiku (simple tasks) + Opus 4.6 (complex tasks), you can comfortably get through ~50–75 messages/day for a full month. The key is not using expensive models for everything.
Summary¶
| Concept | One-Sentence Explanation |
|---|---|
| Model | The AI "brain" — different models = different strengths, speeds, and costs |
| Token | The unit of measurement — roughly ¾ of a word (~4 characters) |
| Context window | The shared whiteboard — everything must fit on it |
| Premium requests | Your monthly AI budget — different plans give different amounts (50 to 1,500) |
| Model multiplier | How many premium requests a single message costs (0.33× to 50×) |
| Hard limit + fallback | When you run out, Copilot drops to GPT-4.1 (free, less capable) — doesn't stop working |
| Pay-per-use | Optional overage billing (~$0.04/request × multiplier) so you never get downgraded |
/model |
Switch which brain you're using |
/context |
Check how full your whiteboard is |
/compact |
Summarise conversation to free up whiteboard space |
/usage |
Check your remaining monthly budget |
Ctrl+T |
Toggle the AI's visible thinking process |
Next: Useful Commands → for a complete reference of every command mentioned here.