Skip to main content

Command Palette

Search for a command to run...

The Context Rot Guide: Stopping Your Claude Code from Drifting

Updated
12 min read
The Context Rot Guide: Stopping Your Claude Code from Drifting

Introduction

  • "The first 10 steps are genius, but once the context window gets saturated, the agent just... drifts." This observation from a Reddit user perfectly captures what Claude Code practitioners call Context Rot — the phenomenon where AI coding agents progressively lose their ability to recall information and make coherent decisions during long sessions. [Link]

  • The community has colorfully named this the "goldfish syndrome" — your agent remembers brilliantly for the first few exchanges, then starts forgetting file paths, importing from non-existent modules, and reversing decisions it made minutes earlier. This isn't a bug in Claude Code; it's a fundamental architectural constraint of Large Language Models(LLMs).

  • As of December 2025, there is no silver bullet solution. What exists instead is a growing ecosystem of engineering approaches — from Anthropic's official Context Compaction and Subagent architectures to community-developed tools like Beads and Memory MCP servers. Experienced engineers are finding their own answers through trial and error, while the industry converges on a new discipline: Context Engineering.

The Anatomy of Context Rot

What Exactly Is Context Rot?

  • Context Rot refers to the progressive degradation of an LLM's performance as its input token count increases. [Link] The term was first coined on Hacker News in June 2025 and was academically established by Chroma Research in their July 2025 technical report.

  • The phenomenon manifests in several related symptoms:

TermDefinition
Context RotPerformance degradation as input tokens increase
Context DriftAgent deviating from original goals over extended sessions
Lost in the MiddleFailure to retrieve information located in the middle of context
Goldfish SyndromeCommunity metaphor: "forgetting what happened 3 seconds ago"

The Mathematical Reality: O(n²) Attention Complexity

  • The root cause lies in the Transformer architecture itself. [Link] Self-attention requires computing pairwise relationships between all tokens, resulting in O(n²) computational complexity where n equals the number of tokens.

  • For a 200K token context window, this means processing 40 billion pairwise relationships. [Link] Anthropic's engineering documentation explicitly acknowledges this constraint:

"LLMs have an 'attention budget' that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount." — [Link] Anthropic Engineering Blog (September 2025)

Chroma Research: The Empirical Evidence

  • Chroma Research's July 2025 study tested 18 major LLMs including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3. [Link] Their findings were sobering:
FindingImplication
Non-uniform performance degradationAll models degrade as input length increases
Needle-Question semantic distancePerformance drops faster when questions differ semantically from answers
Distractor impactIrrelevant information causes non-linear performance decay
Haystack structure mattersLogically structured text performs differently than shuffled text
  • Crucially, the research revealed that traditional Needle-in-a-Haystack (NIAH) benchmarks overestimate real-world performance because they only test simple lexical matching, not complex reasoning tasks.

The "Lost in the Middle" Problem

  • Stanford researchers first documented this phenomenon in 2023. [Link] LLMs exhibit a U-shaped attention pattern: they recall information well from the beginning and end of their context window, but struggle with content in the middle.
┌─────────────────────────────────────────────────────────┐
│  Beginning      │     Middle        │      End          │
│  (High Recall)  │   (Low Recall)    │  (High Recall)    │
└─────────────────────────────────────────────────────────┘
  • This means that in a long Claude Code session, the instructions you gave early on (stored in CLAUDE.md) and your most recent requests are processed well, but everything in between becomes progressively harder for the model to access.

How Context Rot Manifests in Claude Code

  • Reddit users have documented specific failure patterns that occur after extended sessions:
SymptomUser Description
Circular editing"Optimized with Redis, then switched to Memcached next session, then back to Redis" [Link]
Path amnesia"Forgets file paths generated 5 minutes ago, imports from non-existent modules" [Link]
Config flip-flopping"Port 3000 → 3001 → 3000 in consecutive changes"
Instruction drift"Completely ignores CLAUDE.md directives late in context"
Premature completion"Declares 'project complete' when only halfway done"
  • One user's observation went viral in the community: "Claude Code has the memory of a goldfish and the confidence of a 10x engineer." [Link]

Anthropic's Official Solutions

1. Context Compaction

  • Claude Code implements automatic context compaction when approaching context limits. [Link] The system summarizes conversation history, preserving:

    • Architectural decisions
    • Unresolved bugs
    • Implementation details
    • Recently accessed files (typically the last 5)
  • Users can trigger manual compaction with /compact [instructions] to control what gets preserved. The limitation: aggressive compaction can lose subtle but important context.

2. Context Editing (September 2025)

  • Anthropic introduced programmatic context editing in their API. [Link] Developers can configure automatic cleanup rules:
{
  "context_management": {
    "edits": [{
      "type": "clear_tool_uses_20250919",
      "trigger": { "type": "input_tokens", "value": 30000 },
      "keep": { "type": "tool_uses", "value": 3 }
    }]
  }
}
  • This allows clearing old tool call results while maintaining conversation flow — a surgical approach compared to full compaction.

3. Subagent Architecture

  • Anthropic's recommended pattern for complex tasks involves delegating work to specialized subagents. [Link] Each subagent operates in its own context window and returns only summarized results to the main orchestrator.
┌─────────────────────────────────────────────────────┐
│                 Main Orchestrator                    │
│            (High-level planning + coordination)      │
└───────────┬─────────────┬─────────────┬─────────────┘
            │             │             │
            ▼             ▼             ▼
      ┌──────────┐  ┌──────────┐  ┌──────────┐
      │ Search   │  │ Implement│  │ Test     │
      │ Agent    │  │ Agent    │  │ Agent    │
      └──────────┘  └──────────┘  └──────────┘
           ↓             ↓             ↓
      Summary        Summary        Summary
      (1-2K tokens)  (1-2K tokens)  (1-2K tokens)
  • The key insight: a subagent might consume 30,000 tokens exploring a codebase, but only 1,500 tokens of distilled results return to the main agent.

4. Long-Running Agent Harness (November 2025)

  • Anthropic's research on long-running agents identified four major failure modes and corresponding solutions. [Link]
Failure ModeSolution
One-shotting (attempting everything at once)Feature List file (JSON format with passes: true/false)
Undocumented state on context exhaustionGit commits + Progress file mandatory
No end-to-end testingBrowser automation for E2E verification
Time wasted figuring out how to run appAuto-generated init.sh script
  • Their Two-Agent Harness pattern separates concerns:
    1. Initializer Agent: Sets up environment (feature list, git repo, progress file)
    2. Coding Agent: Implements one feature per session, commits progress

Community-Developed Solutions

1. AST-Based Project Map Injection

  • The most technically elegant community solution involves injecting Abstract Syntax Tree (AST) maps at every turn. [Link]

"I built a local tool that scans the AST and generates a compressed skeleton of the repo (just signatures and imports), and I force that into the system prompt." — u/Necessary-Ring-6060

  • This approach offers several advantages over RAG (Retrieval-Augmented Generation):
    • Deterministic: No vector search uncertainty
    • Structural accuracy: Preserves code hierarchy that semantic search loses
    • Hallucination prevention: Agent sees the actual map, doesn't need to remember it

2. Beads: Agent-First Issue Tracker

  • Steve Yegge's Beads has emerged as a popular solution for multi-session context preservation. [Link] Unlike GitHub Issues, Beads is designed specifically for implementation notes — decisions, blockers, and progress that agents need to reconstruct context.
bd init                    # Initialize in project
bd create "Implement auth" # Create task
bd update auth-001 --notes "COMPLETED: JWT. NEXT: Rate limiting"
  • A three-week trial report from Reddit: [Link]

"The amnesia is gone. I'd spend considerable time re-explaining context after every compaction. Now Claude reconstructs full context automatically by reading bead notes." — u/lakshminp

3. Two-Tab Claude System

  • Some practitioners maintain separate Claude instances for different concerns:
Window 1 (Research/QA)Window 2 (Developer)
Bug analysisImplementation
File/line identificationCode writing
Uses 80-90% of contextFocused execution
  • Results from Window 1 feed Window 2 as distilled, actionable instructions.

4. /clear + Plan File Strategy

  • The most accessible strategy requires no additional tooling:

  • Create PLAN.md with checklist before starting

  • Check off completed items as work progresses
  • Run /clear to reset context
  • Resume with "Continue with PLAN.md"

"You have to give it step by step instructions of exactly what to do, and check the result at each step. Then /clear after each task is completed and tested to be working." — u/TotalBeginnerLol [Link]

5. Memory MCP Servers

  • The Model Context Protocol (MCP) ecosystem has spawned several memory-focused servers:
ToolKey Feature
Serena MCPSemantic code search + language server integration [Link]
Basic Memory MCPLocal markdown-based persistent memory
Heimdall MCP"Remember context about X" command interface
a24z-MemoryFile anchor-based note system

6. Superpowers Plugin: The Comprehensive Solution

  • Jesse Vincent's (obra) Superpowers plugin bundles multiple context management techniques into a unified workflow system. [Link] Unlike piecemeal solutions, it provides a complete lifecycle from initial brainstorming to merged PR.
/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace
  • Core context management features:

    • Subagent-driven development: Each task runs in isolated context, returning only summarized results
    • Plan-file architecture: Auto-generated docs/plans/YYYY-MM-DD-<feature>.md for session-independent continuity
    • Automatic context handoff: New sessions resume by reading plan files—no manual context reconstruction
    • TDD enforcement: The RED-GREEN-REFACTOR cycle becomes mandatory, not optional
  • The session-independent workflow is particularly noteworthy:

# Session 1: Plan and save
> /superpowers:brainstorm Implement rate limiting
# Design saved to docs/plans/2025-12-26-rate-limiting.md

# Session 2 (any time later): Resume
> Read docs/plans and continue
# Superpowers auto-invokes executing-plans skill
  • Simon Willison, Django co-creator, endorsed this approach:

"Jesse is one of the most creative users of coding agents that I know. It's very much worth the investment of time to explore what he's shared." [Link]

  • The token efficiency is significant—core bootstrap loads under 2,000 tokens, with heavy work delegated to subagents that don't pollute the main context. [Link]

Token Economics: The Cost of Fighting Context Rot

  • Anthropic's own data reveals significant token overhead for agent patterns: [Link]
Interaction TypeToken Multiplier
Standard chatbot1x (baseline)
Single agent~4x
Multi-agent system~15x
  • This means multi-agent architectures — while effective against Context Rot — consume roughly 15 times more tokens than simple chat. For Claude Pro/Max subscribers, this can rapidly exhaust usage limits.

Practical Recommendations

Choose Your Strategy Based on Task Scope

ScenarioRecommended Approach
Simple feature (1-2 hours)Frequent /clear usage
Multi-session projectBeads + Progress files
Large-scale refactoringSubagent architecture
Complex debuggingTwo-tab system
Repetitive workflowsCLAUDE.md + Hooks

Anti-Patterns to Avoid

AvoidDo Instead
Single long session for all work/clear after each completed unit
Pasting large text blocksUse file reading tools
Vague instructions ("fix this")Specify file, line, and exact problem
Relying solely on auto-compactionManually run /compact [instructions]
Overloading CLAUDE.mdKeep only universal, minimal guidelines

The Simple Is Best Approach: Let Superpowers Handle It

  • For practitioners who prefer minimal tooling overhead, the instinct is to manually create PLAN.md files with checklists and status tracking. But there's a more elegant solution: Superpowers already implements this pattern with battle-tested workflows.

  • Instead of managing plan files manually, Superpowers provides the complete infrastructure: [Link]

Manual ApproachSuperpowers Equivalent
Create PLAN.md manually/superpowers:write-plan auto-generates docs/plans/YYYY-MM-DD-<feature>.md
Write checklist items yourselfAgent asks clarifying questions, then produces 2-5 minute tasks with exact file paths
Update status as work progressesexecuting-plans skill tracks completion automatically
Remember to run /clearSubagent architecture handles context isolation inherently
Resume with "Continue with PLAN.md"New session: "Read docs/plans and continue" → auto-resumes
  • The workflow becomes remarkably simple:
# Session 1: Design and plan
> /superpowers:brainstorm Add user authentication to my app
# Answer questions one at a time → design saved to docs/plans/ → auto-commit

# Session 2 (hours or days later): Resume
> Read docs/plans and continue
# Superpowers auto-loads executing-plans → picks up exactly where you stopped
  • This isn't just convenience—it's the same session-independent development pattern that Anthropic's research team identified as essential for long-running agents, implemented as a plugin. [Link]

  • The key insight: you don't need to reinvent the plan-file pattern. Superpowers has already refined it through adversarial testing and real-world usage by Claude Code practitioners.

Conclusion: Context Engineering as the New Frontier

  • Context Rot represents a fascinating inflection point in AI coding tools. The problem isn't solvable through raw compute or larger context windows — Anthropic themselves acknowledge that "context windows of all sizes will be subject to context pollution and information relevance concerns." [Link] The O(n²) attention complexity is architectural, not incidental.

  • What we're witnessing is the emergence of Context Engineering as a distinct discipline. Where Prompt Engineering focused on crafting the right words, Context Engineering asks: "What is the minimal, highest-signal set of tokens that maximizes desired outcomes?" This requires thinking about information lifecycle, session boundaries, and external state persistence.

  • The irony is rich: to make AI agents work on complex, long-running tasks, we're essentially building the same infrastructure that human engineering teams have developed over decades — issue trackers, progress files, documentation practices, and handoff protocols. The "goldfish" learns not by getting a better memory, but by writing things down.

  • There is no single correct answer today. The field is actively evolving, with Anthropic shipping new capabilities quarterly and the community iterating on novel approaches. What works best depends on project complexity, personal workflow preferences, and tolerance for tooling overhead. For those seeking comprehensive solutions with minimal configuration, Superpowers stands out—it implements the plan-file pattern, subagent architecture, and session-independent continuity that Anthropic's own research recommends, packaged as a single plugin. You don't need to manually create PLAN.md files or reinvent context management patterns; the infrastructure already exists. [Link]

  • The engineers who thrive with AI coding agents will be those who internalize this reality: the context window is not infinite memory — it's expensive, degrading working memory. Managing it deliberately isn't a workaround; it's the core skill.

References

  • Anthropic Engineering
    • https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
    • https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
  • Chroma Research
    • https://research.trychroma.com/context-rot
  • Academic Research
    • https://arxiv.org/abs/2307.03172 (Stanford "Lost in the Middle")
    • https://arxiv.org/abs/2209.04881 (Self-Attention Complexity)
  • Claude Documentation
    • https://platform.claude.com/docs/en/build-with-claude/context-editing
    • https://platform.claude.com/docs/en/agent-sdk/subagents
  • Community Tools
    • https://github.com/steveyegge/beads (Beads issue tracker)
    • https://github.com/obra/superpowers (Superpowers plugin)
    • https://github.com/oraios/serena (Serena MCP)
  • Superpowers Expert Analysis
    • https://simonwillison.net/2025/Oct/10/superpowers/ (Simon Willison endorsement)
  • Community Discussions (Reddit)
    • https://www.reddit.com/r/ClaudeCode/comments/1pv7ls3/ (Original "goldfish" discussion)
    • https://www.reddit.com/r/ClaudeCode/comments/1ov1z94/ (Beads 3-week review)

More from this blog

T

Taehyeong Lee | Software Engineer

58 posts

I am Software Engineer with 15 years of experience, working at Gentle Monster. I specialize in developing high-load, large-scale processing APIs using Kotlin and Spring Boot. I live in Seoul, Korea.