The promise of long-context models is seductive. If a model says it supports 200K, 1M, or more tokens, it is tempting to throw in the whole repo, a pile of logs, and a backlog ticket and assume the agent will sort it out.
In practice, that usually degrades before it helps. The model pays less attention to the right files, latency climbs, and the fix quality quietly drops. What matters in coding workflows is not advertised window size. It is usable context, the portion of prompt space that stays relevant, reviewable, and fresh enough to guide the next edit.
This post walks through the patterns I trust more: bounded context packets, pinned files, rolling summaries, and refresh loops that let coding agents work with large systems without turning every run into expensive prompt fog.
Why this matters
Coding agents fail in very specific ways when context gets sloppy. They anchor on stale files, miss the one schema or config invariant that matters, waste tokens restating irrelevant repo background, and become harder for humans to review because the packet has no obvious shape.
A bigger context window helps only if you preserve attention and structure. Production workflows need context that is small enough to inspect, refresh, and replace.
Architecture or workflow overview
flowchart LR
A[Task brief] --> B[Repo map + file candidates]
B --> C[Retriever and ranker]
C --> D[Bounded context packet]
D --> E[Model edit step]
E --> F[Verification output]
F --> G[Rolling summary]
G --> H[Packet refresh]
H --> D- Task brief with explicit success criteria
- Pinned files that must stay in view
- Ranked supporting files, capped by token budget
- Short rolling summary of what changed and what was verified
- Fresh retrieval after meaningful edits or failed checks
Implementation details
1) Build a token budget before retrieval
The mistake is starting with files. Start with a budget.
from dataclasses import dataclass
@dataclass
class ContextBudget:
model_window: int
reserved_for_output: int = 12000
reserved_for_system: int = 6000
safety_margin: int = 8000
@property
def usable_tokens(self) -> int:
return max(
0,
self.model_window
- self.reserved_for_output
- self.reserved_for_system
- self.safety_margin,
)
budget = ContextBudget(model_window=200000)
print(budget.usable_tokens) # 174000This is still optimistic. In real coding loops, I often spend only a fraction of the theoretical remaining space on source files because I want room for tool results, diffs, retries, and short-term reasoning.
2) Pin the files that carry invariants
Not every relevant file is equally important. A small migration, policy file, or interface definition can matter more than ten implementation files.
pinned_files:
- path: db/schema.sql
reason: source of truth for column names and constraints
- path: apps/api/src/routes/users.ts
reason: write path touched by this task
- path: package.json
reason: scripts and runtime assumptions
retrieval:
max_supporting_files: 8
max_tokens: 32000
refresh_after:
- failed_test
- file_set_changed
- task_phase_changePinned files stop the common failure mode where retrieval keeps swapping out the one document the model should never forget.
3) Summarize the run, not the whole world
Long sessions need compression, but generic summaries are almost useless. The summary has to preserve constraints, touched files, and unresolved risk.
type RollingSummary = {
task: string;
touchedFiles: string[];
decisions: string[];
verifiedBy: string[];
openRisks: string[];
};
export function updateSummary(prev: RollingSummary, patch: Partial<RollingSummary>) {
return {
...prev,
...patch,
touchedFiles: [...new Set([...(prev.touchedFiles || []), ...(patch.touchedFiles || [])])],
decisions: [...new Set([...(prev.decisions || []), ...(patch.decisions || [])])],
verifiedBy: [...new Set([...(prev.verifiedBy || []), ...(patch.verifiedBy || [])])],
openRisks: patch.openRisks ?? prev.openRisks ?? [],
};
}$ python tools/context_packet.py --task "fix user sync timeout" --model-window 200000
usable_tokens: 174000
pinned_files: 3
supporting_files_selected: 7
supporting_tokens: 28140
summary_tokens: 612
refresh_policy: failed_test,file_set_changed,task_phase_change
status: packet-readyWhat went wrong and the tradeoffs
| Strategy | Main upside | Main downside | When I use it |
|---|---|---|---|
| Stuff the repo into the prompt | Low setup effort | Attention collapse, higher latency, weak reviewability | Almost never |
| Pure semantic retrieval | Good first-pass recall | Misses exact invariants and file relationships | Small doc-heavy repos |
| Pinned files plus ranked support | Better accuracy and easier review | Needs repo-aware setup | Most coding agents |
| Rolling summary only | Token efficient for long sessions | Can hide missing source details | Late-stage iterations |
Pitfalls: stale summary drift, hidden invariant loss, false confidence from giant windows, and cost creep from repeatedly stuffing tool output back into the packet.
Large packets also widen prompt injection exposure when external docs, issue text, or logs are mixed directly into the prompt. Treat retrieved content as untrusted input, especially when the same run can later execute commands or write code.
Best practices: prefer packet refresh over packet growth, pin invariant files explicitly, reserve output space before budgeting source files, and rebuild the packet when the task changes phase.
Practical checklist
- Define a usable token budget, not just the model maximum
- Mark pinned files before retrieval runs
- Cap supporting files and supporting tokens
- Store a rolling summary with decisions, checks, and open risks
- Refresh retrieval after failed tests or subsystem changes
- Review the packet itself when an agent starts making weird edits
What I would not do
I would not market a coding workflow as solved just because a model supports a giant context number. Bigger windows are helpful, but they do not replace ranking, summaries, and disciplined refresh boundaries.
Conclusion
Useful context is shaped, not dumped. The best coding agents I have seen do not win by remembering everything. They win by keeping the right things in view at the right time.