Circuit Breakers for AI Agents That Touch Real Systems

AI agents fail differently than normal request handlers. A flaky model endpoint does not just fail one call. It can trigger retries, replan loops, duplicate tool invocations, and confused fallback behavior that burns budget while doing less work.

I have become a lot more skeptical of simple retry logic in agent systems. When an LLM is orchestrating other systems, a retry policy without a circuit breaker is often an outage amplifier.

This post shows how to wrap model calls and tool invocations in circuit breakers, what signals to trip on, how to recover safely, and where teams usually get the thresholds wrong.

Why this matters

In a production AI workflow, one broken dependency can spread across the whole run. A model API starts timing out, the planner retries and creates more requests, the executor repeats a side-effecting tool call, and the user gets a vague status update while costs keep climbing.

The practical goal is simple. When a dependency gets unhealthy, the agent should do less, explain more, and preserve the option to recover cleanly.

Architecture or workflow overview

Failure containment path

User task→Planner→Guard layer→Model or tool call

timeout or budget breach→breaker opens→cooldown window→safe fallback or stop

stable probe success→half-open→closed again

Mermaid version

flowchart LR
  U[User task] --> P[Planner]
  P --> G[Guard layer]
  G --> M[Model call]
  G --> T[Tool call]
  M -->|timeout/error| B[Circuit breaker]
  T -->|timeout/error| B
  B --> C[Cooldown window]
  C --> F[Fallback or safe stop]

Implementation details

1) Put a breaker in front of every unstable edge

The first mistake is having one global breaker for the entire agent. That hides the real failure domain. Breakers should usually live per model endpoint, per tool class, or per tenant-sensitive integration.

// breaker.ts
export type BreakerState = 'closed' | 'open' | 'half-open';

export class CircuitBreaker {
  private state: BreakerState = 'closed';
  private failures = 0;
  private successes = 0;
  private openedAt = 0;

  constructor(
    private readonly failureThreshold = 5,
    private readonly halfOpenSuccesses = 2,
    private readonly cooldownMs = 30_000,
  ) {}

  canExecute(now = Date.now()) {
    if (this.state === 'open' && now - this.openedAt < this.cooldownMs) return false;
    if (this.state === 'open') {
      this.state = 'half-open';
      this.successes = 0;
    }
    return true;
  }

  recordSuccess() {
    if (this.state === 'half-open') {
      this.successes += 1;
      if (this.successes >= this.halfOpenSuccesses) {
        this.state = 'closed';
        this.failures = 0;
      }
      return;
    }
    this.failures = 0;
  }

  recordFailure() {
    this.failures += 1;
    if (this.failures >= this.failureThreshold) {
      this.state = 'open';
      this.openedAt = Date.now();
    }
  }
}

2) Wrap model calls with timeout and token budget guards

Model outages are not always hard 500s. Often the first symptom is latency drift or cost blowups from repeated re-asks. The wrapper should account for both.

// guarded-model-call.ts
import pTimeout from 'p-timeout';
import { CircuitBreaker } from './breaker';

const plannerBreaker = new CircuitBreaker(4, 2, 45_000);

export async function guardedPlannerCall(client: any, payload: any) {
  if (!plannerBreaker.canExecute()) {
    throw new Error('planner breaker open: skip call and use fallback summary');
  }

  try {
    const result = await pTimeout(
      client.responses.create(payload),
      { milliseconds: 12_000, message: 'planner timed out' }
    );

    if (result.usage?.total_tokens > 40_000) {
      throw new Error('planner exceeded token budget');
    }

    plannerBreaker.recordSuccess();
    return result;
  } catch (error) {
    plannerBreaker.recordFailure();
    throw error;
  }
}

3) Keep policy in config

Prompt tweaks are too soft for operational safety. Breaker policy should live in config that is reviewed like infrastructure.

# breaker-policy.yaml
models:
  planner:
    failure_threshold: 4
    cooldown_ms: 45000
    half_open_successes: 2
    timeout_ms: 12000
    max_total_tokens: 40000
  executor:
    failure_threshold: 3
    cooldown_ms: 60000
    half_open_successes: 1
    timeout_ms: 15000

tools:
  github_write:
    failure_threshold: 2
    cooldown_ms: 180000
    half_open_successes: 1
    side_effecting: true
  web_fetch:
    failure_threshold: 5
    cooldown_ms: 20000
    half_open_successes: 2
    side_effecting: false

4) Log why the breaker opened

A breaker that opens silently creates a second debugging problem. Structured open, half-open, and close events make incidents much easier to reconstruct.

2026-05-05T11:42:09Z breaker.open dependency=planner-model reason=timeout_window
window_failures=4 timeout_ms=12000 fallback=summary_only_request

2026-05-05T11:42:54Z breaker.half_open dependency=planner-model probe=1
2026-05-05T11:43:01Z breaker.close dependency=planner-model stable_successes=2

What went wrong, and the tradeoffs

Without a breaker, retries stack on top of replanning and tool loops. The system looks busy while quality collapses. If a write tool fails after a remote system already accepted the request, an agent may try again. Breakers reduce blast radius, but idempotency keys are still mandatory for write paths.

The opposite problem is false opens. If thresholds are too strict, one short regional wobble opens the circuit for everyone. Sliding windows and tenant-aware scopes help, and half-open probes should stay scarce so recovery testing does not become a thundering herd.

Choice	Upside	Downside	Best fit
Per-model breaker	Clean failure isolation	More configs to tune	Planner and executor use different providers or budgets
Global agent breaker	Easy to add	Hides root cause, over-blocks healthy paths	Tiny prototypes only
Fast open thresholds	Stops cost leaks quickly	Can degrade availability during transient blips	Side-effecting tools or expensive models
Slow open thresholds	Fewer false positives	More wasted retries and user latency	Cheap read-only tools

Best practice

When the breaker opens, do not pretend the agent is still fully capable. Downgrade the plan visibly, for example “search is temporarily degraded, returning cached context only,” so users know the system chose safety on purpose.

Pitfall

Teams often keep retries in the HTTP client, the tool wrapper, and the planner at the same time. That triple stack makes outages look random. Pick one retry owner and let the breaker coordinate the rest.

Practical checklist

define breakers per dependency, not one giant global switch
separate read-only tools from side-effecting tools
count timeouts and token budget blowups as health signals
persist breaker state where parallel workers can see it
cap half-open probes so recovery testing stays gentle
pair write tools with idempotency keys
emit structured open, half-open, and close events
expose a user-facing fallback message instead of silent spinning
review thresholds after real incidents, not only in staging

Conclusion

Agent reliability gets much better when unhealthy dependencies cause smaller behavior, not louder behavior. Circuit breakers are not glamorous, but they are one of the cleanest ways to stop model hiccups and flaky tools from turning into cascading incidents.

If I were adding only three things tomorrow, I would start with per-dependency breakers, token-aware health thresholds, and visible fallback messages.

Why this matters

Architecture or workflow overview

Implementation details

1) Put a breaker in front of every unstable edge

2) Wrap model calls with timeout and token budget guards

3) Keep policy in config

4) Log why the breaker opened

What went wrong, and the tradeoffs

Practical checklist

Conclusion

References