The Context Rot Guide: Stopping Your Claude Code from Drifting

Introduction
"The first 10 steps are genius, but once the context window gets saturated, the agent just... drifts." This observation from a Reddit user perfectly captures what Claude Code practitioners call Context Rot — the phenomenon where AI coding agents progressively lose their ability to recall information and make coherent decisions during long sessions. [Link]
The community has colorfully named this the "goldfish syndrome" — your agent remembers brilliantly for the first few exchanges, then starts forgetting file paths, importing from non-existent modules, and reversing decisions it made minutes earlier. This isn't a bug in Claude Code; it's a fundamental architectural constraint of Large Language Models(LLMs).
As of December 2025, there is no silver bullet solution. What exists instead is a growing ecosystem of engineering approaches — from Anthropic's official Context Compaction and Subagent architectures to community-developed tools like Beads and Memory MCP servers. Experienced engineers are finding their own answers through trial and error, while the industry converges on a new discipline: Context Engineering.
The Anatomy of Context Rot
What Exactly Is Context Rot?
Context Rot refers to the progressive degradation of an LLM's performance as its input token count increases. [Link] The term was first coined on Hacker News in June 2025 and was academically established by Chroma Research in their July 2025 technical report.
The phenomenon manifests in several related symptoms:
| Term | Definition |
| Context Rot | Performance degradation as input tokens increase |
| Context Drift | Agent deviating from original goals over extended sessions |
| Lost in the Middle | Failure to retrieve information located in the middle of context |
| Goldfish Syndrome | Community metaphor: "forgetting what happened 3 seconds ago" |
The Mathematical Reality: O(n²) Attention Complexity
The root cause lies in the Transformer architecture itself. [Link] Self-attention requires computing pairwise relationships between all tokens, resulting in O(n²) computational complexity where n equals the number of tokens.
For a 200K token context window, this means processing 40 billion pairwise relationships. [Link] Anthropic's engineering documentation explicitly acknowledges this constraint:
"LLMs have an 'attention budget' that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount." — [Link] Anthropic Engineering Blog (September 2025)
Chroma Research: The Empirical Evidence
- Chroma Research's July 2025 study tested 18 major LLMs including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3. [Link] Their findings were sobering:
| Finding | Implication |
| Non-uniform performance degradation | All models degrade as input length increases |
| Needle-Question semantic distance | Performance drops faster when questions differ semantically from answers |
| Distractor impact | Irrelevant information causes non-linear performance decay |
| Haystack structure matters | Logically structured text performs differently than shuffled text |
- Crucially, the research revealed that traditional Needle-in-a-Haystack (NIAH) benchmarks overestimate real-world performance because they only test simple lexical matching, not complex reasoning tasks.
The "Lost in the Middle" Problem
- Stanford researchers first documented this phenomenon in 2023. [Link] LLMs exhibit a U-shaped attention pattern: they recall information well from the beginning and end of their context window, but struggle with content in the middle.
┌─────────────────────────────────────────────────────────┐
│ Beginning │ Middle │ End │
│ (High Recall) │ (Low Recall) │ (High Recall) │
└─────────────────────────────────────────────────────────┘
- This means that in a long Claude Code session, the instructions you gave early on (stored in CLAUDE.md) and your most recent requests are processed well, but everything in between becomes progressively harder for the model to access.
How Context Rot Manifests in Claude Code
- Reddit users have documented specific failure patterns that occur after extended sessions:
| Symptom | User Description |
| Circular editing | "Optimized with Redis, then switched to Memcached next session, then back to Redis" [Link] |
| Path amnesia | "Forgets file paths generated 5 minutes ago, imports from non-existent modules" [Link] |
| Config flip-flopping | "Port 3000 → 3001 → 3000 in consecutive changes" |
| Instruction drift | "Completely ignores CLAUDE.md directives late in context" |
| Premature completion | "Declares 'project complete' when only halfway done" |
- One user's observation went viral in the community: "Claude Code has the memory of a goldfish and the confidence of a 10x engineer." [Link]
Anthropic's Official Solutions
1. Context Compaction
Claude Code implements automatic context compaction when approaching context limits. [Link] The system summarizes conversation history, preserving:
- Architectural decisions
- Unresolved bugs
- Implementation details
- Recently accessed files (typically the last 5)
Users can trigger manual compaction with
/compact [instructions]to control what gets preserved. The limitation: aggressive compaction can lose subtle but important context.
2. Context Editing (September 2025)
- Anthropic introduced programmatic context editing in their API. [Link] Developers can configure automatic cleanup rules:
{
"context_management": {
"edits": [{
"type": "clear_tool_uses_20250919",
"trigger": { "type": "input_tokens", "value": 30000 },
"keep": { "type": "tool_uses", "value": 3 }
}]
}
}
- This allows clearing old tool call results while maintaining conversation flow — a surgical approach compared to full compaction.
3. Subagent Architecture
- Anthropic's recommended pattern for complex tasks involves delegating work to specialized subagents. [Link] Each subagent operates in its own context window and returns only summarized results to the main orchestrator.
┌─────────────────────────────────────────────────────┐
│ Main Orchestrator │
│ (High-level planning + coordination) │
└───────────┬─────────────┬─────────────┬─────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Search │ │ Implement│ │ Test │
│ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘
↓ ↓ ↓
Summary Summary Summary
(1-2K tokens) (1-2K tokens) (1-2K tokens)
- The key insight: a subagent might consume 30,000 tokens exploring a codebase, but only 1,500 tokens of distilled results return to the main agent.
4. Long-Running Agent Harness (November 2025)
- Anthropic's research on long-running agents identified four major failure modes and corresponding solutions. [Link]
| Failure Mode | Solution |
| One-shotting (attempting everything at once) | Feature List file (JSON format with passes: true/false) |
| Undocumented state on context exhaustion | Git commits + Progress file mandatory |
| No end-to-end testing | Browser automation for E2E verification |
| Time wasted figuring out how to run app | Auto-generated init.sh script |
- Their Two-Agent Harness pattern separates concerns:
- Initializer Agent: Sets up environment (feature list, git repo, progress file)
- Coding Agent: Implements one feature per session, commits progress
Community-Developed Solutions
1. AST-Based Project Map Injection
- The most technically elegant community solution involves injecting Abstract Syntax Tree (AST) maps at every turn. [Link]
"I built a local tool that scans the AST and generates a compressed skeleton of the repo (just signatures and imports), and I force that into the system prompt." — u/Necessary-Ring-6060
- This approach offers several advantages over RAG (Retrieval-Augmented Generation):
- Deterministic: No vector search uncertainty
- Structural accuracy: Preserves code hierarchy that semantic search loses
- Hallucination prevention: Agent sees the actual map, doesn't need to remember it
2. Beads: Agent-First Issue Tracker
- Steve Yegge's Beads has emerged as a popular solution for multi-session context preservation. [Link] Unlike GitHub Issues, Beads is designed specifically for implementation notes — decisions, blockers, and progress that agents need to reconstruct context.
bd init # Initialize in project
bd create "Implement auth" # Create task
bd update auth-001 --notes "COMPLETED: JWT. NEXT: Rate limiting"
- A three-week trial report from Reddit: [Link]
"The amnesia is gone. I'd spend considerable time re-explaining context after every compaction. Now Claude reconstructs full context automatically by reading bead notes." — u/lakshminp
3. Two-Tab Claude System
- Some practitioners maintain separate Claude instances for different concerns:
| Window 1 (Research/QA) | Window 2 (Developer) |
| Bug analysis | Implementation |
| File/line identification | Code writing |
| Uses 80-90% of context | Focused execution |
- Results from Window 1 feed Window 2 as distilled, actionable instructions.
4. /clear + Plan File Strategy
The most accessible strategy requires no additional tooling:
Create
PLAN.mdwith checklist before starting- Check off completed items as work progresses
- Run
/clearto reset context - Resume with "Continue with PLAN.md"
"You have to give it step by step instructions of exactly what to do, and check the result at each step. Then /clear after each task is completed and tested to be working." — u/TotalBeginnerLol [Link]
5. Memory MCP Servers
- The Model Context Protocol (MCP) ecosystem has spawned several memory-focused servers:
| Tool | Key Feature |
| Serena MCP | Semantic code search + language server integration [Link] |
| Basic Memory MCP | Local markdown-based persistent memory |
| Heimdall MCP | "Remember context about X" command interface |
| a24z-Memory | File anchor-based note system |
6. Superpowers Plugin: The Comprehensive Solution
- Jesse Vincent's (obra) Superpowers plugin bundles multiple context management techniques into a unified workflow system. [Link] Unlike piecemeal solutions, it provides a complete lifecycle from initial brainstorming to merged PR.
/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace
Core context management features:
- Subagent-driven development: Each task runs in isolated context, returning only summarized results
- Plan-file architecture: Auto-generated
docs/plans/YYYY-MM-DD-<feature>.mdfor session-independent continuity - Automatic context handoff: New sessions resume by reading plan files—no manual context reconstruction
- TDD enforcement: The RED-GREEN-REFACTOR cycle becomes mandatory, not optional
The session-independent workflow is particularly noteworthy:
# Session 1: Plan and save
> /superpowers:brainstorm Implement rate limiting
# Design saved to docs/plans/2025-12-26-rate-limiting.md
# Session 2 (any time later): Resume
> Read docs/plans and continue
# Superpowers auto-invokes executing-plans skill
- Simon Willison, Django co-creator, endorsed this approach:
"Jesse is one of the most creative users of coding agents that I know. It's very much worth the investment of time to explore what he's shared." [Link]
- The token efficiency is significant—core bootstrap loads under 2,000 tokens, with heavy work delegated to subagents that don't pollute the main context. [Link]
Token Economics: The Cost of Fighting Context Rot
- Anthropic's own data reveals significant token overhead for agent patterns: [Link]
| Interaction Type | Token Multiplier |
| Standard chatbot | 1x (baseline) |
| Single agent | ~4x |
| Multi-agent system | ~15x |
- This means multi-agent architectures — while effective against Context Rot — consume roughly 15 times more tokens than simple chat. For Claude Pro/Max subscribers, this can rapidly exhaust usage limits.
Practical Recommendations
Choose Your Strategy Based on Task Scope
| Scenario | Recommended Approach |
| Simple feature (1-2 hours) | Frequent /clear usage |
| Multi-session project | Beads + Progress files |
| Large-scale refactoring | Subagent architecture |
| Complex debugging | Two-tab system |
| Repetitive workflows | CLAUDE.md + Hooks |
Anti-Patterns to Avoid
| Avoid | Do Instead |
| Single long session for all work | /clear after each completed unit |
| Pasting large text blocks | Use file reading tools |
| Vague instructions ("fix this") | Specify file, line, and exact problem |
| Relying solely on auto-compaction | Manually run /compact [instructions] |
| Overloading CLAUDE.md | Keep only universal, minimal guidelines |
The Simple Is Best Approach: Let Superpowers Handle It
For practitioners who prefer minimal tooling overhead, the instinct is to manually create PLAN.md files with checklists and status tracking. But there's a more elegant solution:
Superpowersalready implements this pattern with battle-tested workflows.Instead of managing plan files manually, Superpowers provides the complete infrastructure: [Link]
| Manual Approach | Superpowers Equivalent |
Create PLAN.md manually | /superpowers:write-plan auto-generates docs/plans/YYYY-MM-DD-<feature>.md |
| Write checklist items yourself | Agent asks clarifying questions, then produces 2-5 minute tasks with exact file paths |
| Update status as work progresses | executing-plans skill tracks completion automatically |
Remember to run /clear | Subagent architecture handles context isolation inherently |
| Resume with "Continue with PLAN.md" | New session: "Read docs/plans and continue" → auto-resumes |
- The workflow becomes remarkably simple:
# Session 1: Design and plan
> /superpowers:brainstorm Add user authentication to my app
# Answer questions one at a time → design saved to docs/plans/ → auto-commit
# Session 2 (hours or days later): Resume
> Read docs/plans and continue
# Superpowers auto-loads executing-plans → picks up exactly where you stopped
This isn't just convenience—it's the same session-independent development pattern that Anthropic's research team identified as essential for long-running agents, implemented as a plugin. [Link]
The key insight: you don't need to reinvent the plan-file pattern. Superpowers has already refined it through adversarial testing and real-world usage by Claude Code practitioners.
Conclusion: Context Engineering as the New Frontier
Context Rot represents a fascinating inflection point in AI coding tools. The problem isn't solvable through raw compute or larger context windows — Anthropic themselves acknowledge that "context windows of all sizes will be subject to context pollution and information relevance concerns." [Link] The O(n²) attention complexity is architectural, not incidental.
What we're witnessing is the emergence of Context Engineering as a distinct discipline. Where Prompt Engineering focused on crafting the right words, Context Engineering asks: "What is the minimal, highest-signal set of tokens that maximizes desired outcomes?" This requires thinking about information lifecycle, session boundaries, and external state persistence.
The irony is rich: to make AI agents work on complex, long-running tasks, we're essentially building the same infrastructure that human engineering teams have developed over decades — issue trackers, progress files, documentation practices, and handoff protocols. The "goldfish" learns not by getting a better memory, but by writing things down.
There is no single correct answer today. The field is actively evolving, with Anthropic shipping new capabilities quarterly and the community iterating on novel approaches. What works best depends on project complexity, personal workflow preferences, and tolerance for tooling overhead. For those seeking comprehensive solutions with minimal configuration, Superpowers stands out—it implements the plan-file pattern, subagent architecture, and session-independent continuity that Anthropic's own research recommends, packaged as a single plugin. You don't need to manually create
PLAN.mdfiles or reinvent context management patterns; the infrastructure already exists. [Link]The engineers who thrive with AI coding agents will be those who internalize this reality: the context window is not infinite memory — it's expensive, degrading working memory. Managing it deliberately isn't a workaround; it's the core skill.
References
- Anthropic Engineering
- https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
- Chroma Research
- https://research.trychroma.com/context-rot
- Academic Research
- https://arxiv.org/abs/2307.03172 (Stanford "Lost in the Middle")
- https://arxiv.org/abs/2209.04881 (Self-Attention Complexity)
- Claude Documentation
- https://platform.claude.com/docs/en/build-with-claude/context-editing
- https://platform.claude.com/docs/en/agent-sdk/subagents
- Community Tools
- https://github.com/steveyegge/beads (Beads issue tracker)
- https://github.com/obra/superpowers (Superpowers plugin)
- https://github.com/oraios/serena (Serena MCP)
- Superpowers Expert Analysis
- https://simonwillison.net/2025/Oct/10/superpowers/ (Simon Willison endorsement)
- Community Discussions (Reddit)
- https://www.reddit.com/r/ClaudeCode/comments/1pv7ls3/ (Original "goldfish" discussion)
- https://www.reddit.com/r/ClaudeCode/comments/1ov1z94/ (Beads 3-week review)




