Webhook Inboxes for AI Agents That Do Not Duplicate Side Effects

Most webhook bugs are boring until they become expensive. A provider retries because your endpoint took too long, your AI worker processes both deliveries, and suddenly the same callback opens two tickets, sends two emails, or triggers the same repo job twice.

This gets worse in agent systems because the callback usually leads to side effects, not just a database write. A duplicate payment event might cause a human-visible message, a tool run, or a state transition that is annoyingly hard to roll back.

The fix is not "retry less." The fix is to treat webhook intake like a durable inbox. Verify the sender, store the raw event, assign an idempotency key, and let workers process a ledgered event exactly once.

Why this matters

If your agent stack touches GitHub, Stripe, Slack, CI, or internal async jobs, you already depend on callback delivery that is at-least-once, occasionally delayed, and sometimes out of order. Building direct side effects into the HTTP handler is the fastest path to subtle production damage.

a fast intake path that only authenticates and persists
a durable event ledger with dedupe state
a worker path that can retry safely
observability that tells you whether a replay was malicious, legitimate, or self-inflicted

Architecture or workflow overview

1. Provider webhook

2. Signature verifier

3. Inbox ledger

4. Safe worker

flowchart LR
  A[Provider webhook] --> B[Signature verifier]
  B --> C[Inbox table or event log]
  C --> D[Dedupe plus state ledger]
  D --> E[Worker queue]
  E --> F[Agent action executor]
  F --> G[External side effect]
  F --> H[Execution result plus audit trail]

Accept the webhook and capture the raw body exactly as delivered.
Verify signature, timestamp, and expected source.
Derive a stable dedupe key from provider event ID or signed headers.
Write the event to an inbox ledger before doing agent work.
Ack the provider quickly.
Let a worker claim the event, run policy checks, and execute the agent side effect.
Record completion, failure, or dead-letter state with enough evidence to replay safely.

Implementation details

1) Verify first, store second, execute later

A good handler does very little. It checks authenticity, writes one durable record, and returns 202 or 200 quickly.

import crypto from "node:crypto";
import express from "express";
import { db } from "./db";

const app = express();
app.use(express.raw({ type: "application/json" }));

function verifySignature(rawBody: Buffer, sigHeader: string, secret: string) {
  const expected = crypto
    .createHmac("sha256", secret)
    .update(rawBody)
    .digest("hex");

  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(sigHeader.replace("sha256=", ""))
  );
}

app.post("/webhooks/agent-events", async (req, res) => {
  const rawBody = req.body as Buffer;
  const signature = req.header("x-signature") ?? "";
  const providerEventId = req.header("x-event-id") ?? "missing";

  if (!verifySignature(rawBody, signature, process.env.WEBHOOK_SECRET!)) {
    return res.status(401).send("invalid signature");
  }

  await db.webhookInbox.insert({
    dedupeKey: providerEventId,
    provider: "example-provider",
    rawBody: rawBody.toString("utf8"),
    receivedAt: new Date(),
    status: "received"
  }).onConflict("dedupe_key").ignore();

  return res.status(202).send("accepted");
});

What I like about this pattern is that it keeps the handler boring. That is a compliment. The HTTP edge should not open PRs, call models, or send chat messages.

2) Claim work through a ledger, not a boolean flag

A single processed=true column sounds fine until retries, worker crashes, and manual replays show up. Use explicit states and lease-style claiming instead.

create table webhook_inbox (
  id bigserial primary key,
  provider text not null,
  dedupe_key text not null,
  status text not null check (status in ('received', 'processing', 'done', 'failed', 'dead_letter')),
  raw_body jsonb not null,
  attempt_count integer not null default 0,
  claimed_by text,
  claimed_until timestamptz,
  received_at timestamptz not null default now(),
  processed_at timestamptz,
  last_error text,
  unique (provider, dedupe_key)
);

update webhook_inbox
set
  status = 'processing',
  claimed_by = $1,
  claimed_until = now() + interval '2 minutes',
  attempt_count = attempt_count + 1
where id = (
  select id
  from webhook_inbox
  where status in ('received', 'failed')
     or (status = 'processing' and claimed_until < now())
  order by received_at asc
  for update skip locked
  limit 1
)
returning *;

3) Make side effects idempotent too

Inbox dedupe is necessary, but it is not sufficient. If the worker opens a GitHub issue or sends a Slack message, that downstream operation should also carry a stable idempotency key.

from dataclasses import dataclass

@dataclass
class ActionContext:
    dedupe_key: str
    run_id: str

async def send_agent_notification(client, channel_id: str, text: str, ctx: ActionContext):
    return await client.post(
        "/messages",
        json={
            "channel": channel_id,
            "text": text,
            "idempotency_key": f"notify:{ctx.dedupe_key}"
        },
        timeout=10,
    )

Terminal output during an incident

$ webhookctl inbox inspect evt_01JX9M9M7S7
provider       example-provider
status         failed
attempt_count  3
dedupe_key     evt_01JX9M9M7S7
claimed_until  expired 41s ago
last_error     GitHub API 502 during issue creation
next_action    retry-safe, side effect not committed

What went wrong, and the tradeoffs

Pitfall: verifying a re-serialized JSON body instead of the raw bytes breaks legitimate signatures and creates intermittent failures that are miserable to debug.

Best practice: acknowledge after durable intake, not after business completion. Let workers own retries, backoff, and dead-letter policy.

Redis can be fine as a cache or coordination layer, but using an expiring key as your only source of truth is fragile for anything that triggers meaningful side effects. You lose auditability, replay context, and confidence during incident response.

Some providers give strong event IDs but weak ordering guarantees. If your agent action depends on sequence, the inbox has to model that explicitly. A job.completed callback arriving before job.started should not corrupt state just because the signature is valid.

A valid signature can also be replayed by an attacker or a broken intermediary if you do not enforce timestamp windows, nonce tracking, or provider event uniqueness. Signature verification proves origin, not freshness.

Tradeoff table

Pattern	Good at	Weak at	When I would use it
Direct handler side effects	Lowest latency	Duplicates, poor recovery, weak audit trail	Almost never for agent workflows
Inbox table + worker queue	Reliability, replay safety, ops visibility	Slightly more system complexity	Default choice
Kafka or log-based intake	High scale, fan-out, retention	More infra and sharper ops edges	Multi-team platforms or very high throughput
Redis dedupe only	Cheap temporary suppression	Weak evidence, TTL footguns	Only as a secondary optimization

Practical checklist

[ ] Verify signatures against the raw request body
[ ] Enforce a timestamp skew limit or replay window
[ ] Persist each event before side effects begin
[ ] Use a unique provider plus event ID dedupe key
[ ] Lease work to workers instead of toggling a boolean flag
[ ] Carry idempotency keys into downstream side effects
[ ] Record terminal states: done, failed, dead-letter
[ ] Expose operator-friendly inspect and replay tooling
[ ] Keep the original payload for audit and debugging
[ ] Alert on repeated failures, not just first failure

Conclusion

Webhook reliability for AI agents is mostly about refusing to do too much at the edge. Build a small authenticated intake path, persist everything that matters, and let workers own retries and side effects. It is a little more plumbing up front, but it is dramatically cheaper than explaining duplicate agent actions after the fact.

Direct references: Stripe webhook docs, GitHub webhook validation, Hookdeck on webhook idempotency.

AI AgentsWebhooksReliabilitySecurityPlatform Engineering