🧠 Models & Context Window¶

The Big Picture: Every time you chat with Copilot CLI, you're talking to an AI model — a specific "brain" — through a context window — a shared "whiteboard." Understanding these two concepts will help you get better results and spend your budget wisely.

What is a Model?¶

Think of each model as a different chef in a restaurant kitchen. They all cook food, but each has different strengths, speeds, and price tags.

When you open the /model menu, you're choosing which chef handles your order.

Model	Provider	Personality	Context	Cost	Best For
Claude Sonnet 4.5	Anthropic	🧑‍🍳 Reliable head chef	200k	1x	Balanced everyday work
Claude Opus 4.6	Anthropic	👨‍🍳 Michelin-star chef	200k	1x	Hardest problems, best quality
Claude Haiku 4.5	Anthropic	🍔 Fast-food cook	200k	0.33x	Quick questions, saving budget
Claude Opus 4.6 1M	Anthropic	👨‍🍳 Same Michelin chef, massive kitchen	1,000k	6x	Huge codebases (internal only)
GPT-5.1	OpenAI	🍝 Chef from a different restaurant chain	200k	1x	Different perspective
GPT-5.4 mini	OpenAI	🥡 Their fast option	200k	0.33x	Quick tasks, budget-friendly

Bottom Line

You don't need to memorise this table. Just remember: Opus 4.6 = smartest at standard cost, Haiku = cheapest, and 1M = huge but expensive.

The `/model` Menu Explained¶

Type /model and you'll see something like this:

┌─────────────────────────────────┬─────────┬──────┐
│ Model                           │ Context │ Cost │
├─────────────────────────────────┼─────────┼──────┤
│ Claude Opus 4.6                 │ 200k    │ 1x   │
│ Claude Sonnet 4.5               │ 200k    │ 1x   │
│ Claude Haiku 4.5                │ 200k    │ 0.33x│
│ Claude Opus 4.6 (1M context)    │ 1000k   │ 6x   │
│ GPT-5.1                         │ 200k    │ 1x   │
│ GPT-5.4 mini                    │ 200k    │ 0.33x│
└─────────────────────────────────┴─────────┴──────┘

Two columns matter: Context and Cost. Let's break each one down.

Context Column (42k, 200k, 1000k)¶

This is the size of the whiteboard the AI uses to hold your entire conversation.

200k tokens ≈ 150,000 words ≈ roughly 2–3 full novels
1,000k tokens ≈ 750,000 words ≈ roughly 10–15 novels

What does 'k' mean?

The "k" stands for thousand. So 200k = 200,000 tokens. Think of it like kilometres — 200k is 200,000 of something.

The bigger the context, the more information the AI can "see" at once — your messages, files you've shared, its own instructions, and more.

Cost Column (1x, 0.33x, 6x)¶

This is how many premium requests each message costs from your monthly budget.

Think of it like a café budget:

The Café Budget Analogy ☕

Imagine you get $100/month to spend at the AI café.

Model	Cost per "Coffee"	Coffees You Get
Haiku 4.5 / GPT-5.4 mini	$0.33 each	~300 coffees ☕☕☕
Sonnet 4.5 / Opus 4.6 / GPT-5.1	$1.00 each	100 coffees ☕
Opus 4.6 1M	$6.00 each	~16 coffees ☕

The cheap coffee is still good — it's just smaller and simpler. The $6 coffee is the same quality as the $1 coffee, but served on a massive table (1M context).

Which Model Should You Use?¶

Here's a simple decision guide:

What are you doing?
│
├── 💬 Quick question or simple task
│   └── → Haiku 4.5 (0.33x) — save your budget
│
├── 📚 Learning / daily work / writing
│   └── → Opus 4.6 (1x) — best brain at standard cost
│
├── 🏗️ Massive project with huge files
│   └── → Opus 4.6 1M (6x) — only when you truly need the space
│
└── 🔄 Want a different perspective or style
    └── → GPT-5.1 (1x) — different "restaurant," different approach

Default recommendation

Start with Opus 4.6. It gives you the best quality at standard cost (1x). Switch to Haiku when you're doing simple things and want to stretch your budget.

What Are Tokens?¶

Tokens are the unit of measurement for the whiteboard. Everything — your messages, the AI's replies, files, instructions — gets converted into tokens.

Common misconception

One word ≠ one token. It's not a 1-to-1 relationship!

Rule of thumb: 1 token ≈ ¾ of a word (~4 characters)

Think of tokens like LEGO bricks:

Short, common words (like "the", "hello") = 1 brick
Longer or unusual words (like "PowerShell") = 2–3 bricks
Technical strings get broken into many bricks

Example Text	Approximate Tokens
`Hello`	1 token
`Good morning`	2 tokens
`PowerShell`	2–3 tokens
`New-AzResourceGroup`	5–7 tokens
`M365CPI52224224.onmicrosoft.com`	~10 tokens
A full page of text (~400 words)	~500–800 tokens
An entire novel (~80,000 words)	~100,000 tokens

Why does this matter?

Because the context window is measured in tokens. When you paste a long error message or a big file, it might eat up more whiteboard space than you'd expect — every character costs tokens.

The Whiteboard (Context Window)¶

The context window is a shared whiteboard between you and the AI. Everything — your questions, the AI's answers, files, tool definitions — lives on this whiteboard.

Here's what it looks like as it fills up:

╔══════════════════════════════════════════════════╗
║              THE WHITEBOARD (200k)               ║
╠══════════════════════════════════════════════════╣
║                                                  ║
║  ██████████████████████░░░░░░░░░░░░░░░░░░░░░░░░ ║
║  ▲ Used: ~35%              Free: ~60%        ▲   ║
║  │                                           │   ║
║  System/Tools                             Buffer ║
║  (loaded before                           (5%)   ║
║   you say hello!)                                ║
║                                                  ║
║  Stage 1: 70% used — Everything is fine 😊       ║
║  ████████████████████████████████░░░░░░░░░░░░░░ ║
║                                                  ║
║  Stage 2: 85% used — Warning appears ⚠️          ║
║  ██████████████████████████████████████░░░░░░░░ ║
║                                                  ║
║  Stage 3: 95% used — Auto-compact kicks in 🔄    ║
║  █████████████████████████████████████████████░░ ║
║                                                  ║
║  Stage 4: 100% full — Can't process! 🛑          ║
║  ████████████████████████████████████████████████ ║
║                                                  ║
╚══════════════════════════════════════════════════╝

What's ON the Whiteboard?¶

Run /context and you'll see how the whiteboard is divided. Here's what each section means:

Section	Typical Size	What It Is	Analogy
System / Tools	~30–40%	My instructions, all available tools (Azure, GitHub, MCP servers, skills)	The café's operating manual — menu, rules, procedures — always pinned to the whiteboard
Messages	Grows over time	Our entire conversation history — every question and answer	The order history — every coffee you've ordered today
Free space	Shrinks over time	Room for more conversation	Empty whiteboard — space for new orders
Buffer	~5%	Safety margin so I don't crash mid-response	Reserved space — like keeping the last page of a notebook blank

Surprising Fact

System/Tools takes up 30–40% before you even say hello! That's like walking into a café and finding the whiteboard already half-full with the menu, health regulations, and staff procedures — before the first customer arrives.

This is why your 200k context window doesn't really give you 200k of conversation space. You effectively start with ~120–140k of usable space.

Traffic Light System¶

Use this to decide when to take action:

🟢 GREEN  (0–60% used)    Everything is fine. Keep going.
🟡 YELLOW (60–80% used)   Consider /compact or /new soon.
🔴 RED    (80%+ used)     Act NOW — /compact, save work, or /clear.

When you're in the RED zone

Don't ignore it. At 95%+, the system will try to auto-compact (summarise and shrink the conversation), but this can lose important details. It's better to manage it yourself before it gets critical.

What to do:

Run /compact to summarise the conversation (frees up space)
Save any important decisions to your instructions file
If needed, run /clear or /new to start fresh

Managing Your Whiteboard¶

Five practical strategies to keep your whiteboard healthy:

#	Strategy	What to Do	Why It Helps
1	Be selective with `@`	Don't `@` entire folders or huge files unless you need them	Every file you reference eats whiteboard space
2	Start new sessions for new topics	Use `/new` when switching to a completely different task	A fresh whiteboard = maximum space
3	Use `/compact` proactively	Run `/compact` when you hit the yellow zone (60–80%)	Summarises conversation, frees up tokens
4	Keep thinking OFF for simple tasks	Toggle with `Ctrl+T` for simple questions	Extended thinking uses extra tokens
5	Save to instructions, then `/clear`	Put important context in `.github/copilot-instructions.md`, then clear	The "passport strategy" — your context survives session resets because instructions are reloaded automatically

The Passport Strategy 🛂

Think of your instructions file as a passport. When you /clear a session, the whiteboard is wiped clean — but your passport (instructions file) is always reloaded. So anything saved there survives across sessions.

This is perfect for things like:

Your tenant IDs and environment details
Preferred commands or workflows
Project-specific context you always need

Model Switching Mid-Conversation¶

You can switch models at any time using /model. Here's what happens:

What carries over (stays on the whiteboard):

✅ Your entire conversation history
✅ Files that were read into context
✅ Decisions and plans already discussed

What changes (the new chef takes over):

🔄 The "brain" processing everything
🔄 Quality and style of responses
🔄 How the AI interprets nuance

It's like switching chefs mid-meal

The new chef reads all the previous chef's notes (your conversation), but they might interpret the recipe differently. The kitchen (whiteboard) stays the same — only the person cooking changes.

Best Practices for Model Switching¶

Do	Don't
✅ Start with Opus 4.6 for best quality at 1x cost	❌ Don't switch models mid-complex-task — it can cause confusion
✅ Switch to Haiku for simple follow-ups to save budget	❌ Don't assume the new model "knows" implicit context
✅ Run `/compact` before switching (clean whiteboard for the new brain)	❌ Don't switch repeatedly back and forth
✅ Brief the new model after switching ("I'm working on X, we decided Y")	❌ Don't use 1M model for short conversations

The "Internal Only" 1M Model¶

The Claude Opus 4.6 (1M context) model is special:

Same brain as regular Opus 4.6 — same intelligence, same quality
Bigger whiteboard — 1,000,000 tokens instead of 200,000 (5× larger)
6× the cost — each message costs 6 premium requests
Available to Microsoft employees only (internal access)

Don't use it all the time

Using the 1M model for everyday questions is like renting a football stadium to cook dinner for two. You're paying 6× the price for space you'll never fill.

When to actually use 1M:

You need to analyse a massive codebase (many large files at once)
You're working with very large documents (100+ page specs)
Your session has been going so long that 200k isn't enough
/compact can't free up enough space for what you need

Smart Approach

Start with regular Opus 4.6 (1x cost)
    │
    ├── Session getting long? → Run /compact
    │
    ├── /compact isn't enough? → Try /new with key context
    │
    └── Still need more space? → NOW switch to 1M (6x cost)

This way, you only pay the 6× premium when you genuinely need the extra space.

Thinking Toggle¶

The thinking toggle controls whether you can see the AI's internal reasoning process.

Setting	What Happens	When to Use
Thinking ON	You see the AI's step-by-step reasoning before the answer	Complex decisions, debugging, understanding why
Thinking OFF	You just see the final answer	Simple questions, saving tokens

Toggle it with: Ctrl+T

Windows Terminal Conflict

In Windows Terminal, Ctrl+T opens a new tab instead of toggling thinking. This is a known conflict.

Workaround: Use a different terminal emulator, or check if your terminal lets you rebind the Ctrl+T shortcut.

Extended thinking is a separate feature — the model automatically "thinks harder" on complex problems, spending more tokens on reasoning before answering. You don't control this directly; it happens when the model detects a hard problem.

Quick Decision Flowchart¶

When in doubt, use this:

Need Copilot CLI help?
│
├── 💬 Quick question or lookup
│   └── → Haiku 4.5 (0.33x) 💰 Save budget
│
├── 📚 Learning / daily work / writing code
│   └── → Opus 4.6 (1x) ⭐ Best brain, standard price
│
├── 🏗️ Massive project, huge files
│   └── → Opus 4.6 1M (6x) 🏟️ Only when truly needed
│
└── 🔄 Want a different style or approach
    └── → GPT-5.1 (1x) 🍝 Different restaurant

Check Your Budget

Run /usage at any time to see how many premium requests you have left this month. Plan accordingly!

> /usage
Premium requests: 73 / 100 remaining (resets in 12 days)

Plans & Premium Request Budgets¶

Now that you understand models and their cost multipliers, let's talk about the real money — how many premium requests you get per month and what happens when you run out.

The Plans¶

Think of this like mobile phone plans — each tier gives you a different amount of "data" (premium requests), and once it's used up, you drop to a slower speed.

Plan	Monthly Price	Premium Requests	Best For
Free	$0	50	Trying it out, casual use
Pro	$10/month	300	Light personal use
Pro+	$39/month	1,500	Heavy personal use (power users)
Business	$19/user/month	300/user	Teams & organisations
Enterprise	$39/user/month	1,000/user	Large organisations with governance needs

Which plan for which user?

50 requests (Free) → You'll burn through this in a single learning session
300 requests (Pro) → Fine for light daily use, tight for heavy sessions
1,500 requests (Pro+) → A solid amount, but model multipliers can drain it fast (see below)

What Happens When You Hit the Limit?¶

This is the critical question. It's a hard limit with a graceful fallback:

graph TD
    A([🎯 You send a message]) --> B{Premium requests remaining?}
    B -->|✅ Yes| C([Uses premium model you selected])
    B -->|❌ No, limit reached| D{Pay-per-use enabled?}
    D -->|Yes, budget set| E([Continues with premium model<br/>Charges overage ~$0.04/request × multiplier])
    D -->|No budget set| F([Falls back to GPT-4.1 / GPT-4o<br/>Unlimited but less capable])

    style C fill:#1a5e2a,stroke:#66ff66,color:#fff
    style E fill:#5e4a1a,stroke:#ffcc66,color:#fff
    style F fill:#5e1a1a,stroke:#ff6666,color:#fff

Scenario	What Happens
Within limit	✅ Everything works normally with your chosen premium model
Hit the limit (no budget)	⚠️ Falls back to GPT-4.1/GPT-4o — unlimited but less capable. Copilot CLI still works, just on a weaker model
Hit the limit (budget enabled)	💰 Continues using premium models, charges ~$0.04/request × model multiplier
Quota reset	🔄 1st of every month at midnight UTC (1 PM NZDT) — counter goes back to zero

Copilot CLI doesn't stop working!

This is the key takeaway — when you exhaust your premium requests, Copilot does not shut down. It drops you to the free base model (GPT-4.1/GPT-4o). You lose access to Claude, GPT-4.5, and other premium models until the next billing cycle — unless you enable overage billing.

Model Multipliers — The Hidden Cost¶

Not every message costs just "1" premium request. Different models have different multipliers — like ordering the most expensive coffee uses more of your café budget.

Model	Multiplier	Requests Used Per Message	With Pro+ (1,500), You Get
GPT-4.1 / GPT-4o	0×	0 (unlimited, base)	∞ unlimited
Claude Haiku 4.5 / GPT-5.4 mini	0.33×	0.33	~4,545 messages
Claude Sonnet 4 / GPT-5.1	1×	1	1,500 messages
Claude Opus 4.6	1×	1	1,500 messages
Claude Opus 4.6 (1M)	6×	6	~250 messages
GPT-4.5	~50×	50	~30 messages 😱

Watch Out for GPT-4.5!

A single message using GPT-4.5 costs 50 premium requests. That means 30 messages would use your entire Pro+ budget for the month. Always check the multiplier before using expensive models.

Real-World Budget Example ☕

Let's say you're on Pro+ (1,500 requests/month) and you use Copilot CLI daily:

Usage Pattern	Daily Messages	Monthly Total	Verdict
All Haiku (0.33×)	75	~750 requests	✅ Plenty of budget left
All Opus 4.6 (1×)	75	~2,250 requests	⚠️ Over budget by week 3
Mix: 50 Haiku + 25 Opus	75	~1,025 requests	✅ Fits within budget
10 GPT-4.5 messages	10	500 requests	😱 Third of budget gone in one session

How to Monitor Your Usage¶

Three ways to keep an eye on your premium request balance:

Method	How to Access	What You See
GitHub Billing Dashboard	github.com/settings/billing	Full analytics: usage by product, model, cost breakdown, trends
In IDE (VS Code etc.)	Click Copilot icon in status bar	Used / remaining / reset date
In Copilot CLI	Type `/usage`	Remaining premium requests and reset date

Check your billing dashboard weekly

The GitHub Billing page has a Premium Request Analytics dashboard (GA since late 2025) that shows detailed breakdowns by model, product (Chat, CLI, Agent, etc.), and time period. This is the most accurate way to track your real consumption.

Budget Tips — Stretching Your Premium Requests¶

Here are practical strategies to make your monthly budget last:

#	Strategy	Impact	How
1	Use Haiku for simple tasks	Save ~67% per message	Switch to Haiku via `/model` for quick questions, lookups, simple edits
2	Batch your questions	Fewer turns = fewer requests	Ask everything in one message instead of 5 separate questions
3	Enable pay-per-use as safety net	Never lose access unexpectedly	Set a small budget ($5–10) at github.com/settings/billing
4	Monitor weekly	Catch overspending early	Check `/usage` or the billing dashboard every Monday
5	Avoid GPT-4.5	Save 50× per message	Use Opus 4.6 (1×) instead — similar quality, 50× cheaper
6	Use `/compact` regularly	Indirect savings	Shorter context = less token processing per message

The Sweet Spot for Pro+

With 1,500 requests on Pro+ and a mix of Haiku (simple tasks) + Opus 4.6 (complex tasks), you can comfortably get through ~50–75 messages/day for a full month. The key is not using expensive models for everything.

Summary¶

Concept	One-Sentence Explanation
Model	The AI "brain" — different models = different strengths, speeds, and costs
Token	The unit of measurement — roughly ¾ of a word (~4 characters)
Context window	The shared whiteboard — everything must fit on it
Premium requests	Your monthly AI budget — different plans give different amounts (50 to 1,500)
Model multiplier	How many premium requests a single message costs (0.33× to 50×)
Hard limit + fallback	When you run out, Copilot drops to GPT-4.1 (free, less capable) — doesn't stop working
Pay-per-use	Optional overage billing (~$0.04/request × multiplier) so you never get downgraded
`/model`	Switch which brain you're using
`/context`	Check how full your whiteboard is
`/compact`	Summarise conversation to free up whiteboard space
`/usage`	Check your remaining monthly budget
`Ctrl+T`	Toggle the AI's visible thinking process

Next: Useful Commands → for a complete reference of every command mentioned here.