Hybrid Code Retrieval for AI Coding Agents That Beats Full-Repo Prompt Stuffing

Most AI coding mistakes are retrieval mistakes wearing a generation costume. The model edits the wrong file, misses the helper that already exists, or rewrites an API because the only thing it saw was a vague prompt plus a pile of repo text.

The fix is usually not a bigger context window. It is better retrieval. In this post I will show the hybrid setup I would use for a real coding agent: embeddings for semantic recall, tree-sitter for code structure, ripgrep for exact matches, and a lightweight reranker before anything reaches the prompt.

This matters because smaller, more relevant context tends to improve both accuracy and reviewability. It also makes failures easier to debug when the model still gets it wrong.

Why this matters

If your agent only has one retrieval move, it will fail in predictable ways. Embeddings miss exact symbol names, keyword search misses semantically related helpers, and whole-file stuffing buries the useful lines under noise.

For code, retrieval has to do three jobs at once: find the right files, find the right symbols inside those files, and deliver a compact packet the model can actually use. That is why a hybrid stack works better than any single index.

tree-sitter for syntax-aware parsing and symbol extraction
ripgrep for exact symbol and literal search
pgvector if you want a practical embeddings store in Postgres
OpenTelemetry if you want to trace retrieval latency and failure modes

Architecture or workflow overview

flowchart LR A[User task] --> B[Intent classifier] B --> C[Embedding recall] B --> D[tree-sitter symbol index] B --> E[ripgrep exact match] C --> F[Candidate merge] D --> F E --> F F --> G[Reranker] G --> H[Context packet builder] H --> I[Agent edit or answer] I --> J[Verifier and tests]

Best practice: embeddings should retrieve chunks, tree-sitter should retrieve symbols, ripgrep should retrieve literals, and the prompt builder should enforce a hard budget instead of accepting every match.

Implementation details

Build a symbol-aware index, not just file chunks

A flat embedding index over random 800-token code chunks is better than nothing, but it loses too much structure. I prefer indexing symbols and selected surrounding spans.

import Parser from 'tree-sitter';
import TypeScript from 'tree-sitter-typescript';

const parser = new Parser();
parser.setLanguage(TypeScript.typescript);

export function extractSymbols(source: string, path: string) {
  const tree = parser.parse(source);
  const symbols = [];

  function walk(node) {
    if ([
      'function_declaration',
      'class_declaration',
      'method_definition',
      'interface_declaration',
      'type_alias_declaration'
    ].includes(node.type)) {
      const nameNode = node.childForFieldName('name');
      symbols.push({
        path,
        name: nameNode?.text ?? 'anonymous',
        kind: node.type,
        start: node.startIndex,
        end: node.endIndex
      });
    }

    for (const child of node.children) walk(child);
  }

  walk(tree.rootNode);
  return symbols;
}

The useful trick is that the chunk boundary follows code structure. Later, when the agent asks about buildContextPacket, you can fetch the function body, its imports, and a nearby helper instead of two pages of unrelated code.

Keep keyword search as a first-class retrieval path

Exact search is still the best tool for some jobs. If a model mentions X-Request-Id, FEATURE_FLAG_AGENT_MODE, or a stack-trace symbol, I do not want to hope embeddings figure it out.

rg -n --hidden --glob '!node_modules' --glob '!dist' \
  'buildContextPacket|ContextBudget|FEATURE_FLAG_AGENT_MODE' .

I usually treat ripgrep hits as high-confidence candidates when the query includes exact symbols, references a stack trace, or targets config keys that semantic search often underweights.

Merge candidates, then rerank before prompt assembly

Once you have candidates from different channels, score them together. Embeddings alone will overvalue semantically similar but operationally irrelevant files.

retrieval:
  maxCandidates: 40
  finalContextFiles: 8
  strategies:
    - name: embeddings
      weight: 0.45
    - name: tree_sitter_symbol_match
      weight: 0.35
    - name: ripgrep_exact_match
      weight: 0.20
  rerank:
    enabled: true
    model: cross-encoder-mini
    keepTopK: 12

My rule of thumb is simple: retrieve broadly, rerank aggressively, prompt narrowly.

Build context packets that are reviewable by humans too

A good context packet is not just model food. It should be readable enough that a human reviewer can inspect the evidence later.

{
  "task": "Add retry jitter to the GitHub webhook worker",
  "files": [
    { "path": "src/workers/githubWebhook.ts", "reason": "Primary retry loop lives here" },
    { "path": "src/lib/backoff.ts", "reason": "Existing delay helpers already used by adjacent workers" }
  ],
  "symbols": ["processWebhookEvent", "computeBackoffMs", "RetryPolicy"],
  "tests": ["tests/githubWebhook.test.ts"]
}

That shape tends to produce better edits because the agent receives a compact map instead of an undifferentiated text dump.

What went wrong / tradeoffs

Pitfall: the most common production bug is stale embeddings after a refactor. If your semantic index is not tied to a commit SHA, the model can edit dead code with full confidence.

Tree-sitter is great, but only where you actually have grammars and clean parse paths. In mixed repos you will still hit YAML, SQL, shell, generated SDKs, and templates, so your retrieval layer needs graceful degradation instead of pretending the structural index is complete.

Retrieval recall and prompt usefulness are also not the same thing. More files can reduce answer quality by making the task ambiguous.

Choice	Benefit	Cost	When I would use it
Large top-k recall	Better coverage	More noise and latency	Early exploration and offline evals
Aggressive reranking	Smaller prompt, better focus	Can hide edge-case files	Normal coding tasks
Whole-file inclusion	More surrounding context	Token bloat	Tiny files or config files
Symbol-level excerpts	High precision	Needs indexing work	Most application code

Two things are easy to miss. First, prompt injection can ride in through retrieved docs or copied issue text, so external content should be treated as tainted. Second, retrieval is part of your correctness boundary. If the wrong file makes the packet, the model can be perfectly obedient and still ship the wrong patch.

Practical checklist

[ ] embeddings are built from symbol-aware chunks, not random megachunks
[ ] exact search exists and is not hidden behind a failure path
[ ] every packet records why each file was included
[ ] retrieval is tied to a repo SHA or change counter
[ ] final context has a hard token budget and hard file-count budget
[ ] tests and adjacent config files are eligible retrieval targets
[ ] you can inspect retrieval latency and false-positive rates in traces

Conclusion

If I were building this today, I would start embarrassingly simple: ripgrep, tree-sitter symbol extraction, embeddings in a small local store, one reranker, and a packet builder with explicit budgets.

I would not start with a giant vector pipeline and five clever heuristics. Most teams do not have a generation problem first. They have a retrieval discipline problem.

AI Agents Code Retrieval Tree-sitter Embeddings Developer Tools