Back to Blog
Agentic Coding

Claude 1M Context Era: Optimize Cost & Performance with Sonnet 4.6 + opusplan

11.325min
claudecontext-windowsonnet-4-6opusplanclaude-code

With Claude Sonnet 4.6 released and 1M context windows now available, learn the new context management strategies and how opusplan automatically uses Opus for planning and Sonnet for execution to optimize costs.

Claude 1M Context Era: Optimize Cost & Performance with Sonnet 4.6 + opusplan
The intuitive response to a 1M context window is "load more." But the more interesting question is whether models can actually use that space effectively — and the answer turns out to be model-dependent in a way that directly shapes your strategy.
The MRCR v2 benchmark makes this concrete: Sonnet 4.5 retrieves relevant information from 1M tokens at 18.5% accuracy, while Opus 4.6 hits 76%. Same context window, completely different utilization. That gap is what drives everything else in this post.
In February 2026, Anthropic shipped Opus 4.6 (February 5th) and Sonnet 4.6 (February 17th) two weeks apart. Here's how to use the combination effectively.

1. Sonnet 4.6 + 1M Context: Current Landscape

1.1 How the Two Models Are Positioned

Opus 4.6 (released 2026-02-05):
  • First Opus model with 1M context window (beta)
  • Pricing: $5/$25 per MTok (up to 200K), $10/$37.50 (beyond 200K)
  • Adaptive reasoning level controls (low/medium/high)
Sonnet 4.6 (released 2026-02-17):
  • Now the default model for Free/Pro plans
  • Pricing: $3/$15 per MTok (up to 200K), $6/$22.50 (beyond 200K)
  • Max output of 64K tokens
  • 70% preference over Sonnet 4.5 in Claude Code usage
Both models support 1M context in beta. It's activated via the anthropic-beta: context-1m-2025-08-07 header, available to API usage tier 4 and above.

1.2 MRCR v2 Benchmark: Real Differences at 1M

There's a benchmark that makes the performance gap between these two models at 1M context very clear. It's the MRCR v2 8-needle 1M variant test, measured by Google DeepMind's Michelangelo framework.
ModelMRCR v2 (1M)Note
Claude Opus 4.676%4.1x vs. Sonnet 4.5
Claude Sonnet 4.518.5%Baseline
In practical terms: loading your entire codebase into Sonnet at 1M and hoping it finds what it needs is a poor strategy. At 18.5% retrieval accuracy, you're paying for the context but not getting the benefit. Opus at 76% is a different story. This single benchmark should determine your entire loading strategy.
Loading diagram...
MRCR v2 1M Benchmark - Retrieval accuracy comparison at 1M context

2. Core Strategy 1: opusplan

2.1 What Is opusplan?

Claude Code has a model alias called opusplan. It's become my default setup, and here's how it works:
  • Plan mode: Uses Opus 4.6
  • Execute mode: Automatically switches to Sonnet
The plan phase often determines the entire quality of a task. What to build, which files to touch, in what order. opusplan dedicates Opus to that critical planning step while letting Sonnet handle the actual code generation.

2.2 How to Set It Up

Pick whichever approach fits your workflow:
Bash
# 1. Flag at session start
claude --model opusplan
# 2. Environment variable (add to shell profile)
export ANTHROPIC_MODEL=opusplan
# 3. Switch during a session
/model opusplan
JSON
// 4. settings.json (project default)
{
"model": "opusplan"
}

2.3 What's the Cost Difference?

I ran the numbers for a typical feature development session.
The planning phase consumes roughly 10-15% of total tokens — most tokens go to code generation and file edits. Using Opus for the full session costs 1.67x more per token on both input and output compared to Sonnet.
With opusplan, you get Opus quality for planning and Sonnet efficiency for execution. That's a meaningful cost reduction without sacrificing the part that matters most.
Loading diagram...
opusplan Flow - Opus for planning, Sonnet for execution, automatically routed

3. Core Strategy 2: When to Actually Use 1M

3.1 Understand the Cost Structure First

You don't want 1M context left on by default. The pricing changes the moment you cross 200K.
RangeOpus 4.6Sonnet 4.6
0-200K (input)$5/MTok$3/MTok
200K+ (input)$10/MTok$6/MTok
0-200K (output)$25/MTok$15/MTok
200K+ (output)$37.50/MTok$22.50/MTok
Input doubles and output goes up 1.5x once you cross 200K. Loading your entire codebase out of habit will get expensive fast.

3.2 Task-Based Strategy

The MRCR v2 numbers make the task-based breakdown fairly clear.
Day-to-day feature development → Standard 200K + on-demand exploration
Given Sonnet's 18.5% retrieval accuracy, loading your full codebase into Sonnet is actively counterproductive. Claude Code's on-demand approach — searching with Grep and Glob to load only what's needed — is more accurate and substantially cheaper.
Bash
# Do this
claude --model opusplan # Default Sonnet 200K
# Avoid this (wasteful)
claude --model opus[1m] # Loading entire codebase into 1M
Domain model refactoring, module boundary redesignopus[1m] or sonnet[1m]
These tasks require seeing cross-file dependencies at a glance. You need the whole picture.
Bash
# Architecture analysis session
claude --model opus[1m]
# Switch during a Claude Code session
/model sonnet[1m]
Code review / security auditopus[1m]
Loading an entire module and analyzing cross-cutting concerns is exactly where Opus's 76% retrieval accuracy pays off.
Model aliases in Claude Code:
Bash
# Available aliases
sonnet # Latest Sonnet (currently Sonnet 4.6) - daily coding
sonnet[1m] # 1M context Sonnet - long sessions
opus # Latest Opus
opus[1m] # 1M context Opus - architecture analysis
opusplan # Plan: Opus / Execute: Sonnet (recommended)

4. The Evolution of Context Management

4.1 Rethinking Compaction Strategy

Think of context management as workspace organization. In the 200K era, compaction was forced tidying — when the desk filled up, you had to clear it. Running /compact at 70-75% was mandatory, not optional. There was no room for nuance.
At 1M, that urgency disappears. The desk is much larger. But that's exactly the trap: a bigger space makes it tempting to let things pile up. Tool call results, failed attempts, intermediate steps accumulate over a long session, and even with ample context, the model has a harder time finding what actually matters. It's not a space problem anymore — it's a signal-to-noise problem.
What needs to change at 1M isn't compaction timing, it's compaction scope.
200K Approach1M Approach
WhenAt 70-75% usageWhen noise accumulates
TargetEntire contextOld tool call results only
PurposeFree up spaceImprove signal-to-noise ratio
This is why partial compaction becomes genuinely useful at 1M:
  • Trim only old tool call results selectively
  • Use Esc+Esc → "Summarize from here" for partial compaction
  • Preserve core context (SPECs, architecture decisions), discard intermediate noise
Loading diagram...
Compaction Strategy Evolution - 200K vs 1M context management approach

4.2 Full Load vs. On-Demand Exploration

The MRCR v2 numbers clarify this decision. At 18.5% retrieval accuracy, loading everything into Sonnet may actually be less accurate than having it search for what it needs via Grep and Glob. On-demand exploration can outperform full loading when the model can't reliably find things in a large context. That's why Anthropic's official recommendation for Claude Code is on-demand file exploration rather than bulk loading.
Full loading only makes sense when paired with Opus's 76% retrieval accuracy.
Full load is better when (use Opus):
  • Large-scale refactoring — need to see multi-file relationships at once
  • Architecture review — understanding the full dependency graph
  • Security audits — cross-cutting concern analysis
On-demand exploration is still better when:
  • Single-file bug fixes
  • Adding a feature with narrow impact
  • Most everyday Sonnet-based development
On cost: 50 files × 200 lines = ~200K tokens. Loading that every session adds up to several dollars a day. Full loading is worth it when you're using Opus and the work genuinely requires a full-codebase view — not as a default.

4.3 The Role of CLAUDE.md and External State Files

CLAUDE.md remains essential even in the 1M era — it loads at every session start. But clearly separating responsibilities across files makes it far more effective.
Role separation principle:
Plain Text
CLAUDE.md → Pointer (what to read and when)
.claude/rules/ → Detailed rules (topic/path-specific, auto-loaded)
.moai/specs/ → Work state (SPEC docs, preserved across sessions)
.moai/projects/ → Project metadata (config, history)
Leveraging .claude/rules/moai/:
Instead of cramming all rules into CLAUDE.md, splitting them into .claude/rules/moai/ by topic is far more scalable. Claude Code automatically loads these files as project instructions, and the paths frontmatter means rules only activate for matching file types.
YAML
---
# .claude/rules/moai/workflow/spec-workflow.md
paths:
- ".moai/specs/**"
- "src/**/*.ts"
---
# SPEC workflow rules (only loaded when working with TypeScript or SPEC files)
In a 1M environment, you can maintain these rule files in greater detail. In the 200K era, you had to compress rules to keep CLAUDE.md short. Now you can manage detailed domain-specific rules in separate files and only load them when needed.
Leveraging .moai/specs/:
Plan Mode combined with SPEC-driven development is the core pattern of the 1M era. SPEC documents generated by /moai plan are saved to .moai/specs/SPEC-XXX/spec.md, serving as external memory that preserves work state across sessions.
Bash
# Step 1: Plan session (Opus generates the SPEC)
claude --model opusplan
> /moai plan "Add user authentication middleware"
# → Creates .moai/specs/SPEC-001/spec.md
> /clear # Release context
# Step 2: Run session (SPEC-driven execution)
claude --model opusplan
> /moai run SPEC-001
# Opus reads SPEC and plans → Sonnet implements
SPEC documents survive compaction. Even when a session gets long enough to need /clear, referencing SPEC-001 in the next session fully restores context.
Leveraging .moai/projects/:
Maintaining project config and metadata here gives you a structured starting point when loading the full project context in a 1M session. The directory structure looks like this:
Plain Text
.moai/
├── projects/ # Project config and metadata
├── specs/ # SPEC documents (state preserved across sessions)
│ ├── SPEC-001/
│ │ └── spec.md
│ └── SPEC-002/
│ └── spec.md
└── config/ # Model, language, quality settings
With this structure, CLAUDE.md becomes a true pointer file — nothing more.
Markdown
# CLAUDE.md (pointer role only)
## Detailed Rules
- Workflow rules: see .claude/rules/moai/workflow/
- Development standards: see .claude/rules/moai/development/
## Current Work
- Active SPECs: see .moai/specs/
- Project config: see .moai/projects/

5. Practical Workflows

Here's how the model and context strategies combine by task type.

Workflow 1: Day-to-Day Feature Development

The default pattern. Start here unless there's a specific reason not to.
Bash
claude --model opusplan
# Plan mode (Opus): scope analysis, file discovery, implementation ordering
# Execute mode (Sonnet): actual code generation, file edits, automated
# When noise accumulates, use partial compaction
# Esc+Esc → "Summarize from here"

Workflow 2: Architecture Analysis → Implementation

Splitting analysis and execution into separate sessions is the key move. Use Opus 1M for analysis only, then start a clean session driven by a SPEC.
Bash
# Step 1: Analysis session (Opus uses the full context effectively)
claude --model opus[1m]
> Load full domain code + analyze dependencies
> /moai plan "refactoring goal"
# → Creates .moai/specs/SPEC-001/spec.md
> /clear
# Step 2: Implementation session (SPEC-driven)
claude --model opusplan
> /moai run SPEC-001
# Opus reads SPEC and plans → Sonnet implements

Workflow 3: Security Audit / Code Review

Work that genuinely requires seeing everything at once. This is where Opus's 76% retrieval accuracy earns its cost.
Bash
claude --model opus[1m]
> Load full module under review
> Analyze cross-cutting concerns
> Save issue report as SPEC in .moai/specs/
> /clear
Loading diagram...
Practical Workflow - Model selection strategy by task type

Conclusion

The 1M context window isn't really about loading more — it's about making better decisions about what to load, which model to pair it with, and when to clean up.
The MRCR v2 numbers provide the decision framework. Sonnet's 18.5% retrieval accuracy at 1M means on-demand exploration often outperforms full loading for Sonnet-based tasks. Opus's 76% is what makes full-codebase loading actually worthwhile. Match the loading strategy to the model, not the other way around.
Three principles that follow from this:
  1. Default to opusplan: Opus for planning, Sonnet for execution. The most practical starting point.
  2. Pair 1M with Opus: Activate for architecture analysis and security audits where a full view genuinely matters — not as a default.
  3. Compaction targets noise, not everything: Partial compaction over /compact as a blanket operation.
With Sonnet 4.6 as the default model, opusplan is the right baseline. Expand to opus[1m] when the work actually warrants it.

References