Usable Context vs Advertised Context for AI Coding Agents

The promise of long-context models is seductive. If a model says it supports 200K, 1M, or more tokens, it is tempting to throw in the whole repo, a pile of logs, and a backlog ticket and assume the agent will sort it out.

In practice, that usually degrades before it helps. The model pays less attention to the right files, latency climbs, and the fix quality quietly drops. What matters in coding workflows is not advertised window size. It is usable context, the portion of prompt space that stays relevant, reviewable, and fresh enough to guide the next edit.

This post walks through the patterns I trust more: bounded context packets, pinned files, rolling summaries, and refresh loops that let coding agents work with large systems without turning every run into expensive prompt fog.

Why this matters

Coding agents fail in very specific ways when context gets sloppy. They anchor on stale files, miss the one schema or config invariant that matters, waste tokens restating irrelevant repo background, and become harder for humans to review because the packet has no obvious shape.

A bigger context window helps only if you preserve attention and structure. Production workflows need context that is small enough to inspect, refresh, and replace.

Architecture or workflow overview

flowchart LR
    A[Task brief] --> B[Repo map + file candidates]
    B --> C[Retriever and ranker]
    C --> D[Bounded context packet]
    D --> E[Model edit step]
    E --> F[Verification output]
    F --> G[Rolling summary]
    G --> H[Packet refresh]
    H --> D

Task brief with explicit success criteria
Pinned files that must stay in view
Ranked supporting files, capped by token budget
Short rolling summary of what changed and what was verified
Fresh retrieval after meaningful edits or failed checks

Implementation details

1) Build a token budget before retrieval

The mistake is starting with files. Start with a budget.

from dataclasses import dataclass

@dataclass
class ContextBudget:
    model_window: int
    reserved_for_output: int = 12000
    reserved_for_system: int = 6000
    safety_margin: int = 8000

    @property
    def usable_tokens(self) -> int:
        return max(
            0,
            self.model_window
            - self.reserved_for_output
            - self.reserved_for_system
            - self.safety_margin,
        )

budget = ContextBudget(model_window=200000)
print(budget.usable_tokens)  # 174000

This is still optimistic. In real coding loops, I often spend only a fraction of the theoretical remaining space on source files because I want room for tool results, diffs, retries, and short-term reasoning.

2) Pin the files that carry invariants

Not every relevant file is equally important. A small migration, policy file, or interface definition can matter more than ten implementation files.

pinned_files:
  - path: db/schema.sql
    reason: source of truth for column names and constraints
  - path: apps/api/src/routes/users.ts
    reason: write path touched by this task
  - path: package.json
    reason: scripts and runtime assumptions
retrieval:
  max_supporting_files: 8
  max_tokens: 32000
  refresh_after:
    - failed_test
    - file_set_changed
    - task_phase_change

Pinned files stop the common failure mode where retrieval keeps swapping out the one document the model should never forget.

3) Summarize the run, not the whole world

Long sessions need compression, but generic summaries are almost useless. The summary has to preserve constraints, touched files, and unresolved risk.

type RollingSummary = {
  task: string;
  touchedFiles: string[];
  decisions: string[];
  verifiedBy: string[];
  openRisks: string[];
};

export function updateSummary(prev: RollingSummary, patch: Partial<RollingSummary>) {
  return {
    ...prev,
    ...patch,
    touchedFiles: [...new Set([...(prev.touchedFiles || []), ...(patch.touchedFiles || [])])],
    decisions: [...new Set([...(prev.decisions || []), ...(patch.decisions || [])])],
    verifiedBy: [...new Set([...(prev.verifiedBy || []), ...(patch.verifiedBy || [])])],
    openRisks: patch.openRisks ?? prev.openRisks ?? [],
  };
}

$ python tools/context_packet.py --task "fix user sync timeout" --model-window 200000
usable_tokens: 174000
pinned_files: 3
supporting_files_selected: 7
supporting_tokens: 28140
summary_tokens: 612
refresh_policy: failed_test,file_set_changed,task_phase_change
status: packet-ready

What went wrong and the tradeoffs

Strategy	Main upside	Main downside	When I use it
Stuff the repo into the prompt	Low setup effort	Attention collapse, higher latency, weak reviewability	Almost never
Pure semantic retrieval	Good first-pass recall	Misses exact invariants and file relationships	Small doc-heavy repos
Pinned files plus ranked support	Better accuracy and easier review	Needs repo-aware setup	Most coding agents
Rolling summary only	Token efficient for long sessions	Can hide missing source details	Late-stage iterations

Pitfalls: stale summary drift, hidden invariant loss, false confidence from giant windows, and cost creep from repeatedly stuffing tool output back into the packet.

Large packets also widen prompt injection exposure when external docs, issue text, or logs are mixed directly into the prompt. Treat retrieved content as untrusted input, especially when the same run can later execute commands or write code.

Best practices: prefer packet refresh over packet growth, pin invariant files explicitly, reserve output space before budgeting source files, and rebuild the packet when the task changes phase.

Practical checklist

Define a usable token budget, not just the model maximum
Mark pinned files before retrieval runs
Cap supporting files and supporting tokens
Store a rolling summary with decisions, checks, and open risks
Refresh retrieval after failed tests or subsystem changes
Review the packet itself when an agent starts making weird edits

What I would not do

I would not market a coding workflow as solved just because a model supports a giant context number. Bigger windows are helpful, but they do not replace ranking, summaries, and disciplined refresh boundaries.

Conclusion

Useful context is shaped, not dumped. The best coding agents I have seen do not win by remembering everything. They win by keeping the right things in view at the right time.