Most AI coding mistakes are retrieval mistakes wearing a generation costume. The model edits the wrong file, misses the helper that already exists, or rewrites an API because the only thing it saw was a vague prompt plus a pile of repo text.
The fix is usually not a bigger context window. It is better retrieval. In this post I will show the hybrid setup I would use for a real coding agent: embeddings for semantic recall, tree-sitter for code structure, ripgrep for exact matches, and a lightweight reranker before anything reaches the prompt.
This matters because smaller, more relevant context tends to improve both accuracy and reviewability. It also makes failures easier to debug when the model still gets it wrong.
Why this matters
If your agent only has one retrieval move, it will fail in predictable ways. Embeddings miss exact symbol names, keyword search misses semantically related helpers, and whole-file stuffing buries the useful lines under noise.
For code, retrieval has to do three jobs at once: find the right files, find the right symbols inside those files, and deliver a compact packet the model can actually use. That is why a hybrid stack works better than any single index.
- tree-sitter for syntax-aware parsing and symbol extraction
- ripgrep for exact symbol and literal search
- pgvector if you want a practical embeddings store in Postgres
- OpenTelemetry if you want to trace retrieval latency and failure modes
Architecture or workflow overview
Implementation details
Build a symbol-aware index, not just file chunks
A flat embedding index over random 800-token code chunks is better than nothing, but it loses too much structure. I prefer indexing symbols and selected surrounding spans.
import Parser from 'tree-sitter';
import TypeScript from 'tree-sitter-typescript';
const parser = new Parser();
parser.setLanguage(TypeScript.typescript);
export function extractSymbols(source: string, path: string) {
const tree = parser.parse(source);
const symbols = [];
function walk(node) {
if ([
'function_declaration',
'class_declaration',
'method_definition',
'interface_declaration',
'type_alias_declaration'
].includes(node.type)) {
const nameNode = node.childForFieldName('name');
symbols.push({
path,
name: nameNode?.text ?? 'anonymous',
kind: node.type,
start: node.startIndex,
end: node.endIndex
});
}
for (const child of node.children) walk(child);
}
walk(tree.rootNode);
return symbols;
}The useful trick is that the chunk boundary follows code structure. Later, when the agent asks about buildContextPacket, you can fetch the function body, its imports, and a nearby helper instead of two pages of unrelated code.
Keep keyword search as a first-class retrieval path
Exact search is still the best tool for some jobs. If a model mentions X-Request-Id, FEATURE_FLAG_AGENT_MODE, or a stack-trace symbol, I do not want to hope embeddings figure it out.
rg -n --hidden --glob '!node_modules' --glob '!dist' \
'buildContextPacket|ContextBudget|FEATURE_FLAG_AGENT_MODE' .I usually treat ripgrep hits as high-confidence candidates when the query includes exact symbols, references a stack trace, or targets config keys that semantic search often underweights.
Merge candidates, then rerank before prompt assembly
Once you have candidates from different channels, score them together. Embeddings alone will overvalue semantically similar but operationally irrelevant files.
retrieval:
maxCandidates: 40
finalContextFiles: 8
strategies:
- name: embeddings
weight: 0.45
- name: tree_sitter_symbol_match
weight: 0.35
- name: ripgrep_exact_match
weight: 0.20
rerank:
enabled: true
model: cross-encoder-mini
keepTopK: 12My rule of thumb is simple: retrieve broadly, rerank aggressively, prompt narrowly.
Build context packets that are reviewable by humans too
A good context packet is not just model food. It should be readable enough that a human reviewer can inspect the evidence later.
{
"task": "Add retry jitter to the GitHub webhook worker",
"files": [
{ "path": "src/workers/githubWebhook.ts", "reason": "Primary retry loop lives here" },
{ "path": "src/lib/backoff.ts", "reason": "Existing delay helpers already used by adjacent workers" }
],
"symbols": ["processWebhookEvent", "computeBackoffMs", "RetryPolicy"],
"tests": ["tests/githubWebhook.test.ts"]
}That shape tends to produce better edits because the agent receives a compact map instead of an undifferentiated text dump.
What went wrong / tradeoffs
Tree-sitter is great, but only where you actually have grammars and clean parse paths. In mixed repos you will still hit YAML, SQL, shell, generated SDKs, and templates, so your retrieval layer needs graceful degradation instead of pretending the structural index is complete.
Retrieval recall and prompt usefulness are also not the same thing. More files can reduce answer quality by making the task ambiguous.
| Choice | Benefit | Cost | When I would use it |
|---|---|---|---|
| Large top-k recall | Better coverage | More noise and latency | Early exploration and offline evals |
| Aggressive reranking | Smaller prompt, better focus | Can hide edge-case files | Normal coding tasks |
| Whole-file inclusion | More surrounding context | Token bloat | Tiny files or config files |
| Symbol-level excerpts | High precision | Needs indexing work | Most application code |
Two things are easy to miss. First, prompt injection can ride in through retrieved docs or copied issue text, so external content should be treated as tainted. Second, retrieval is part of your correctness boundary. If the wrong file makes the packet, the model can be perfectly obedient and still ship the wrong patch.
Practical checklist
- [ ] embeddings are built from symbol-aware chunks, not random megachunks
- [ ] exact search exists and is not hidden behind a failure path
- [ ] every packet records why each file was included
- [ ] retrieval is tied to a repo SHA or change counter
- [ ] final context has a hard token budget and hard file-count budget
- [ ] tests and adjacent config files are eligible retrieval targets
- [ ] you can inspect retrieval latency and false-positive rates in traces
Conclusion
If I were building this today, I would start embarrassingly simple: ripgrep, tree-sitter symbol extraction, embeddings in a small local store, one reranker, and a packet builder with explicit budgets.
I would not start with a giant vector pipeline and five clever heuristics. Most teams do not have a generation problem first. They have a retrieval discipline problem.