AI agents fail differently than normal request handlers. A flaky model endpoint does not just fail one call. It can trigger retries, replan loops, duplicate tool invocations, and confused fallback behavior that burns budget while doing less work.
I have become a lot more skeptical of simple retry logic in agent systems. When an LLM is orchestrating other systems, a retry policy without a circuit breaker is often an outage amplifier.
This post shows how to wrap model calls and tool invocations in circuit breakers, what signals to trip on, how to recover safely, and where teams usually get the thresholds wrong.
Why this matters
In a production AI workflow, one broken dependency can spread across the whole run. A model API starts timing out, the planner retries and creates more requests, the executor repeats a side-effecting tool call, and the user gets a vague status update while costs keep climbing.
The practical goal is simple. When a dependency gets unhealthy, the agent should do less, explain more, and preserve the option to recover cleanly.
Architecture or workflow overview
flowchart LR
U[User task] --> P[Planner]
P --> G[Guard layer]
G --> M[Model call]
G --> T[Tool call]
M -->|timeout/error| B[Circuit breaker]
T -->|timeout/error| B
B --> C[Cooldown window]
C --> F[Fallback or safe stop]
Implementation details
1) Put a breaker in front of every unstable edge
The first mistake is having one global breaker for the entire agent. That hides the real failure domain. Breakers should usually live per model endpoint, per tool class, or per tenant-sensitive integration.
// breaker.ts
export type BreakerState = 'closed' | 'open' | 'half-open';
export class CircuitBreaker {
private state: BreakerState = 'closed';
private failures = 0;
private successes = 0;
private openedAt = 0;
constructor(
private readonly failureThreshold = 5,
private readonly halfOpenSuccesses = 2,
private readonly cooldownMs = 30_000,
) {}
canExecute(now = Date.now()) {
if (this.state === 'open' && now - this.openedAt < this.cooldownMs) return false;
if (this.state === 'open') {
this.state = 'half-open';
this.successes = 0;
}
return true;
}
recordSuccess() {
if (this.state === 'half-open') {
this.successes += 1;
if (this.successes >= this.halfOpenSuccesses) {
this.state = 'closed';
this.failures = 0;
}
return;
}
this.failures = 0;
}
recordFailure() {
this.failures += 1;
if (this.failures >= this.failureThreshold) {
this.state = 'open';
this.openedAt = Date.now();
}
}
}2) Wrap model calls with timeout and token budget guards
Model outages are not always hard 500s. Often the first symptom is latency drift or cost blowups from repeated re-asks. The wrapper should account for both.
// guarded-model-call.ts
import pTimeout from 'p-timeout';
import { CircuitBreaker } from './breaker';
const plannerBreaker = new CircuitBreaker(4, 2, 45_000);
export async function guardedPlannerCall(client: any, payload: any) {
if (!plannerBreaker.canExecute()) {
throw new Error('planner breaker open: skip call and use fallback summary');
}
try {
const result = await pTimeout(
client.responses.create(payload),
{ milliseconds: 12_000, message: 'planner timed out' }
);
if (result.usage?.total_tokens > 40_000) {
throw new Error('planner exceeded token budget');
}
plannerBreaker.recordSuccess();
return result;
} catch (error) {
plannerBreaker.recordFailure();
throw error;
}
}3) Keep policy in config
Prompt tweaks are too soft for operational safety. Breaker policy should live in config that is reviewed like infrastructure.
# breaker-policy.yaml
models:
planner:
failure_threshold: 4
cooldown_ms: 45000
half_open_successes: 2
timeout_ms: 12000
max_total_tokens: 40000
executor:
failure_threshold: 3
cooldown_ms: 60000
half_open_successes: 1
timeout_ms: 15000
tools:
github_write:
failure_threshold: 2
cooldown_ms: 180000
half_open_successes: 1
side_effecting: true
web_fetch:
failure_threshold: 5
cooldown_ms: 20000
half_open_successes: 2
side_effecting: false4) Log why the breaker opened
A breaker that opens silently creates a second debugging problem. Structured open, half-open, and close events make incidents much easier to reconstruct.
2026-05-05T11:42:09Z breaker.open dependency=planner-model reason=timeout_window
window_failures=4 timeout_ms=12000 fallback=summary_only_request
2026-05-05T11:42:54Z breaker.half_open dependency=planner-model probe=1
2026-05-05T11:43:01Z breaker.close dependency=planner-model stable_successes=2What went wrong, and the tradeoffs
Without a breaker, retries stack on top of replanning and tool loops. The system looks busy while quality collapses. If a write tool fails after a remote system already accepted the request, an agent may try again. Breakers reduce blast radius, but idempotency keys are still mandatory for write paths.
The opposite problem is false opens. If thresholds are too strict, one short regional wobble opens the circuit for everyone. Sliding windows and tenant-aware scopes help, and half-open probes should stay scarce so recovery testing does not become a thundering herd.
| Choice | Upside | Downside | Best fit |
|---|---|---|---|
| Per-model breaker | Clean failure isolation | More configs to tune | Planner and executor use different providers or budgets |
| Global agent breaker | Easy to add | Hides root cause, over-blocks healthy paths | Tiny prototypes only |
| Fast open thresholds | Stops cost leaks quickly | Can degrade availability during transient blips | Side-effecting tools or expensive models |
| Slow open thresholds | Fewer false positives | More wasted retries and user latency | Cheap read-only tools |
When the breaker opens, do not pretend the agent is still fully capable. Downgrade the plan visibly, for example “search is temporarily degraded, returning cached context only,” so users know the system chose safety on purpose.
Teams often keep retries in the HTTP client, the tool wrapper, and the planner at the same time. That triple stack makes outages look random. Pick one retry owner and let the breaker coordinate the rest.
Practical checklist
- define breakers per dependency, not one giant global switch
- separate read-only tools from side-effecting tools
- count timeouts and token budget blowups as health signals
- persist breaker state where parallel workers can see it
- cap half-open probes so recovery testing stays gentle
- pair write tools with idempotency keys
- emit structured open, half-open, and close events
- expose a user-facing fallback message instead of silent spinning
- review thresholds after real incidents, not only in staging
Conclusion
Agent reliability gets much better when unhealthy dependencies cause smaller behavior, not louder behavior. Circuit breakers are not glamorous, but they are one of the cleanest ways to stop model hiccups and flaky tools from turning into cascading incidents.
If I were adding only three things tomorrow, I would start with per-dependency breakers, token-aware health thresholds, and visible fallback messages.