The Context Rot Guide: Stopping Your Claude Code from Drifting

Introduction

"The first 10 steps are genius, but once the context window gets saturated, the agent just... drifts." This observation from a Reddit user perfectly captures what Claude Code practitioners call Context Rot — the phenomenon where AI coding agents progressively lose their ability to recall information and make coherent decisions during long sessions. [Link]
The community has colorfully named this the "goldfish syndrome" — your agent remembers brilliantly for the first few exchanges, then starts forgetting file paths, importing from non-existent modules, and reversing decisions it made minutes earlier. This isn't a bug in Claude Code; it's a fundamental architectural constraint of Large Language Models(LLMs).
As of December 2025, there is no silver bullet solution. What exists instead is a growing ecosystem of engineering approaches — from Anthropic's official Context Compaction and Subagent architectures to community-developed tools like Beads and Memory MCP servers. Experienced engineers are finding their own answers through trial and error, while the industry converges on a new discipline: Context Engineering.

The Anatomy of Context Rot

What Exactly Is Context Rot?

Context Rot refers to the progressive degradation of an LLM's performance as its input token count increases. [Link] The term was first coined on Hacker News in June 2025 and was academically established by Chroma Research in their July 2025 technical report.
The phenomenon manifests in several related symptoms:

Term	Definition
Context Rot	Performance degradation as input tokens increase
Context Drift	Agent deviating from original goals over extended sessions
Lost in the Middle	Failure to retrieve information located in the middle of context
Goldfish Syndrome	Community metaphor: "forgetting what happened 3 seconds ago"

The Mathematical Reality: O(n²) Attention Complexity

The root cause lies in the Transformer architecture itself. [Link] Self-attention requires computing pairwise relationships between all tokens, resulting in O(n²) computational complexity where n equals the number of tokens.
For a 200K token context window, this means processing 40 billion pairwise relationships. [Link] Anthropic's engineering documentation explicitly acknowledges this constraint:

"LLMs have an 'attention budget' that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount." — [Link] Anthropic Engineering Blog (September 2025)

Chroma Research: The Empirical Evidence

Chroma Research's July 2025 study tested 18 major LLMs including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3. [Link] Their findings were sobering:

Finding	Implication
Non-uniform performance degradation	All models degrade as input length increases
Needle-Question semantic distance	Performance drops faster when questions differ semantically from answers
Distractor impact	Irrelevant information causes non-linear performance decay
Haystack structure matters	Logically structured text performs differently than shuffled text

Crucially, the research revealed that traditional Needle-in-a-Haystack (NIAH) benchmarks overestimate real-world performance because they only test simple lexical matching, not complex reasoning tasks.

The "Lost in the Middle" Problem

Stanford researchers first documented this phenomenon in 2023. [Link] LLMs exhibit a U-shaped attention pattern: they recall information well from the beginning and end of their context window, but struggle with content in the middle.

┌─────────────────────────────────────────────────────────┐
│  Beginning      │     Middle        │      End          │
│  (High Recall)  │   (Low Recall)    │  (High Recall)    │
└─────────────────────────────────────────────────────────┘

This means that in a long Claude Code session, the instructions you gave early on (stored in CLAUDE.md) and your most recent requests are processed well, but everything in between becomes progressively harder for the model to access.

How Context Rot Manifests in Claude Code

Reddit users have documented specific failure patterns that occur after extended sessions:

Symptom	User Description
Circular editing	"Optimized with Redis, then switched to Memcached next session, then back to Redis" [Link]
Path amnesia	"Forgets file paths generated 5 minutes ago, imports from non-existent modules" [Link]
Config flip-flopping	"Port 3000 → 3001 → 3000 in consecutive changes"
Instruction drift	"Completely ignores CLAUDE.md directives late in context"
Premature completion	"Declares 'project complete' when only halfway done"

One user's observation went viral in the community: "Claude Code has the memory of a goldfish and the confidence of a 10x engineer." [Link]

Anthropic's Official Solutions

1. Context Compaction

Claude Code implements automatic context compaction when approaching context limits. [Link] The system summarizes conversation history, preserving:
- Architectural decisions
- Unresolved bugs
- Implementation details
- Recently accessed files (typically the last 5)
Users can trigger manual compaction with /compact [instructions] to control what gets preserved. The limitation: aggressive compaction can lose subtle but important context.

2. Context Editing (September 2025)

Anthropic introduced programmatic context editing in their API. [Link] Developers can configure automatic cleanup rules:

{
  "context_management": {
    "edits": [{
      "type": "clear_tool_uses_20250919",
      "trigger": { "type": "input_tokens", "value": 30000 },
      "keep": { "type": "tool_uses", "value": 3 }
    }]
  }
}

This allows clearing old tool call results while maintaining conversation flow — a surgical approach compared to full compaction.

3. Subagent Architecture

Anthropic's recommended pattern for complex tasks involves delegating work to specialized subagents. [Link] Each subagent operates in its own context window and returns only summarized results to the main orchestrator.

┌─────────────────────────────────────────────────────┐
│                 Main Orchestrator                    │
│            (High-level planning + coordination)      │
└───────────┬─────────────┬─────────────┬─────────────┘
            │             │             │
            ▼             ▼             ▼
      ┌──────────┐  ┌──────────┐  ┌──────────┐
      │ Search   │  │ Implement│  │ Test     │
      │ Agent    │  │ Agent    │  │ Agent    │
      └──────────┘  └──────────┘  └──────────┘
           ↓             ↓             ↓
      Summary        Summary        Summary
      (1-2K tokens)  (1-2K tokens)  (1-2K tokens)

The key insight: a subagent might consume 30,000 tokens exploring a codebase, but only 1,500 tokens of distilled results return to the main agent.

4. Long-Running Agent Harness (November 2025)

Anthropic's research on long-running agents identified four major failure modes and corresponding solutions. [Link]

Failure Mode	Solution
One-shotting (attempting everything at once)	Feature List file (JSON format with `passes: true/false`)
Undocumented state on context exhaustion	Git commits + Progress file mandatory
No end-to-end testing	Browser automation for E2E verification
Time wasted figuring out how to run app	Auto-generated `init.sh` script

Their Two-Agent Harness pattern separates concerns:
1. Initializer Agent: Sets up environment (feature list, git repo, progress file)
2. Coding Agent: Implements one feature per session, commits progress

Community-Developed Solutions

1. AST-Based Project Map Injection

The most technically elegant community solution involves injecting Abstract Syntax Tree (AST) maps at every turn. [Link]

"I built a local tool that scans the AST and generates a compressed skeleton of the repo (just signatures and imports), and I force that into the system prompt." — u/Necessary-Ring-6060

This approach offers several advantages over RAG (Retrieval-Augmented Generation):
- Deterministic: No vector search uncertainty
- Structural accuracy: Preserves code hierarchy that semantic search loses
- Hallucination prevention: Agent sees the actual map, doesn't need to remember it

2. Beads: Agent-First Issue Tracker

Steve Yegge's Beads has emerged as a popular solution for multi-session context preservation. [Link] Unlike GitHub Issues, Beads is designed specifically for implementation notes — decisions, blockers, and progress that agents need to reconstruct context.

bd init                    # Initialize in project
bd create "Implement auth" # Create task
bd update auth-001 --notes "COMPLETED: JWT. NEXT: Rate limiting"

A three-week trial report from Reddit: [Link]

"The amnesia is gone. I'd spend considerable time re-explaining context after every compaction. Now Claude reconstructs full context automatically by reading bead notes." — u/lakshminp

3. Two-Tab Claude System

Some practitioners maintain separate Claude instances for different concerns:

Window 1 (Research/QA)	Window 2 (Developer)
Bug analysis	Implementation
File/line identification	Code writing
Uses 80-90% of context	Focused execution

Results from Window 1 feed Window 2 as distilled, actionable instructions.

4. /clear + Plan File Strategy

The most accessible strategy requires no additional tooling:
Create PLAN.md with checklist before starting
Check off completed items as work progresses
Run /clear to reset context
Resume with "Continue with PLAN.md"

"You have to give it step by step instructions of exactly what to do, and check the result at each step. Then /clear after each task is completed and tested to be working." — u/TotalBeginnerLol [Link]

5. Memory MCP Servers

The Model Context Protocol (MCP) ecosystem has spawned several memory-focused servers:

Tool	Key Feature
Serena MCP	Semantic code search + language server integration [Link]
Basic Memory MCP	Local markdown-based persistent memory
Heimdall MCP	"Remember context about X" command interface
a24z-Memory	File anchor-based note system

6. Superpowers Plugin: The Comprehensive Solution

Jesse Vincent's (obra) Superpowers plugin bundles multiple context management techniques into a unified workflow system. [Link] Unlike piecemeal solutions, it provides a complete lifecycle from initial brainstorming to merged PR.

/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

Core context management features:
- Subagent-driven development: Each task runs in isolated context, returning only summarized results
- Plan-file architecture: Auto-generated docs/plans/YYYY-MM-DD-<feature>.md for session-independent continuity
- Automatic context handoff: New sessions resume by reading plan files—no manual context reconstruction
- TDD enforcement: The RED-GREEN-REFACTOR cycle becomes mandatory, not optional
The session-independent workflow is particularly noteworthy:

# Session 1: Plan and save
> /superpowers:brainstorm Implement rate limiting
# Design saved to docs/plans/2025-12-26-rate-limiting.md

# Session 2 (any time later): Resume
> Read docs/plans and continue
# Superpowers auto-invokes executing-plans skill

Simon Willison, Django co-creator, endorsed this approach:

"Jesse is one of the most creative users of coding agents that I know. It's very much worth the investment of time to explore what he's shared." [Link]

The token efficiency is significant—core bootstrap loads under 2,000 tokens, with heavy work delegated to subagents that don't pollute the main context. [Link]

Token Economics: The Cost of Fighting Context Rot

Anthropic's own data reveals significant token overhead for agent patterns: [Link]

Interaction Type	Token Multiplier
Standard chatbot	1x (baseline)
Single agent	~4x
Multi-agent system	~15x

This means multi-agent architectures — while effective against Context Rot — consume roughly 15 times more tokens than simple chat. For Claude Pro/Max subscribers, this can rapidly exhaust usage limits.

Practical Recommendations

Choose Your Strategy Based on Task Scope

Scenario	Recommended Approach
Simple feature (1-2 hours)	Frequent `/clear` usage
Multi-session project	Beads + Progress files
Large-scale refactoring	Subagent architecture
Complex debugging	Two-tab system
Repetitive workflows	CLAUDE.md + Hooks

Anti-Patterns to Avoid

Avoid	Do Instead
Single long session for all work	`/clear` after each completed unit
Pasting large text blocks	Use file reading tools
Vague instructions ("fix this")	Specify file, line, and exact problem
Relying solely on auto-compaction	Manually run `/compact [instructions]`
Overloading CLAUDE.md	Keep only universal, minimal guidelines

The Simple Is Best Approach: Let Superpowers Handle It

For practitioners who prefer minimal tooling overhead, the instinct is to manually create PLAN.md files with checklists and status tracking. But there's a more elegant solution: Superpowers already implements this pattern with battle-tested workflows.
Instead of managing plan files manually, Superpowers provides the complete infrastructure: [Link]

Manual Approach	Superpowers Equivalent
Create `PLAN.md` manually	`/superpowers:write-plan` auto-generates `docs/plans/YYYY-MM-DD-<feature>.md`
Write checklist items yourself	Agent asks clarifying questions, then produces 2-5 minute tasks with exact file paths
Update status as work progresses	`executing-plans` skill tracks completion automatically
Remember to run `/clear`	Subagent architecture handles context isolation inherently
Resume with "Continue with PLAN.md"	New session: "Read docs/plans and continue" → auto-resumes

The workflow becomes remarkably simple:

# Session 1: Design and plan
> /superpowers:brainstorm Add user authentication to my app
# Answer questions one at a time → design saved to docs/plans/ → auto-commit

# Session 2 (hours or days later): Resume
> Read docs/plans and continue
# Superpowers auto-loads executing-plans → picks up exactly where you stopped

This isn't just convenience—it's the same session-independent development pattern that Anthropic's research team identified as essential for long-running agents, implemented as a plugin. [Link]
The key insight: you don't need to reinvent the plan-file pattern. Superpowers has already refined it through adversarial testing and real-world usage by Claude Code practitioners.

Conclusion: Context Engineering as the New Frontier

Context Rot represents a fascinating inflection point in AI coding tools. The problem isn't solvable through raw compute or larger context windows — Anthropic themselves acknowledge that "context windows of all sizes will be subject to context pollution and information relevance concerns." [Link] The O(n²) attention complexity is architectural, not incidental.
What we're witnessing is the emergence of Context Engineering as a distinct discipline. Where Prompt Engineering focused on crafting the right words, Context Engineering asks: "What is the minimal, highest-signal set of tokens that maximizes desired outcomes?" This requires thinking about information lifecycle, session boundaries, and external state persistence.
The irony is rich: to make AI agents work on complex, long-running tasks, we're essentially building the same infrastructure that human engineering teams have developed over decades — issue trackers, progress files, documentation practices, and handoff protocols. The "goldfish" learns not by getting a better memory, but by writing things down.
There is no single correct answer today. The field is actively evolving, with Anthropic shipping new capabilities quarterly and the community iterating on novel approaches. What works best depends on project complexity, personal workflow preferences, and tolerance for tooling overhead. For those seeking comprehensive solutions with minimal configuration, Superpowers stands out—it implements the plan-file pattern, subagent architecture, and session-independent continuity that Anthropic's own research recommends, packaged as a single plugin. You don't need to manually create PLAN.md files or reinvent context management patterns; the infrastructure already exists. [Link]
The engineers who thrive with AI coding agents will be those who internalize this reality: the context window is not infinite memory — it's expensive, degrading working memory. Managing it deliberately isn't a workaround; it's the core skill.

References

Anthropic Engineering
- https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
Chroma Research
- https://research.trychroma.com/context-rot
Academic Research
- https://arxiv.org/abs/2307.03172 (Stanford "Lost in the Middle")
- https://arxiv.org/abs/2209.04881 (Self-Attention Complexity)
Claude Documentation
- https://platform.claude.com/docs/en/build-with-claude/context-editing
- https://platform.claude.com/docs/en/agent-sdk/subagents
Community Tools
- https://github.com/steveyegge/beads (Beads issue tracker)
- https://github.com/obra/superpowers (Superpowers plugin)
- https://github.com/oraios/serena (Serena MCP)
Superpowers Expert Analysis
- https://simonwillison.net/2025/Oct/10/superpowers/ (Simon Willison endorsement)
Community Discussions (Reddit)
- https://www.reddit.com/r/ClaudeCode/comments/1pv7ls3/ (Original "goldfish" discussion)
- https://www.reddit.com/r/ClaudeCode/comments/1ov1z94/ (Beads 3-week review)

The Context Rot Guide: Stopping Your Claude Code from Drifting

Introduction

The Anatomy of Context Rot

What Exactly Is Context Rot?

The Mathematical Reality: O(n²) Attention Complexity

Chroma Research: The Empirical Evidence

The "Lost in the Middle" Problem

How Context Rot Manifests in Claude Code

Anthropic's Official Solutions

1. Context Compaction

2. Context Editing (September 2025)

3. Subagent Architecture

4. Long-Running Agent Harness (November 2025)

Community-Developed Solutions

1. AST-Based Project Map Injection

2. Beads: Agent-First Issue Tracker

3. Two-Tab Claude System

4. /clear + Plan File Strategy

5. Memory MCP Servers

6. Superpowers Plugin: The Comprehensive Solution

Token Economics: The Cost of Fighting Context Rot

Practical Recommendations

Choose Your Strategy Based on Task Scope

Anti-Patterns to Avoid

The Simple Is Best Approach: Let Superpowers Handle It

Conclusion: Context Engineering as the New Frontier

References

Comments

More from this blog

Building Your Own LLM Wiki with Claude Code: A Minimalist's Guide (Without the Obsidian Lock-In)

Claude Code Remote Control: A Pocket-Sized Full-Stack Engineer (Finally)

Claude Opus 4.6: The Philosopher with a Sledgehammer

Source Grounding in the LLM Era: Why Claude Code's Power Users Choose Brave Search MCP

How to Build a 100% Uncensored Local LLM Environment on WSL2

Command Palette

Introduction

The Anatomy of Context Rot

What Exactly Is Context Rot?

The Mathematical Reality: O(n²) Attention Complexity

Chroma Research: The Empirical Evidence

The "Lost in the Middle" Problem

How Context Rot Manifests in Claude Code

Anthropic's Official Solutions

1. Context Compaction

2. Context Editing (September 2025)

3. Subagent Architecture

4. Long-Running Agent Harness (November 2025)

Community-Developed Solutions

1. AST-Based Project Map Injection

2. Beads: Agent-First Issue Tracker

3. Two-Tab Claude System

4. /clear + Plan File Strategy

5. Memory MCP Servers

6. Superpowers Plugin: The Comprehensive Solution

Token Economics: The Cost of Fighting Context Rot

Practical Recommendations

Choose Your Strategy Based on Task Scope

Anti-Patterns to Avoid

The Simple Is Best Approach: Let Superpowers Handle It

Conclusion: Context Engineering as the New Frontier

References

Comments

More from this blog