Schema-First Tool Contracts for AI Agents That Fail Closed

Most agent failures I see are not model failures. They are contract failures.

The model picks the right tool in spirit, then sends one field with the wrong shape, one enum value with the wrong casing, or one half-parsed date that slips into a real side effect. If your executor is permissive, the bug moves downstream and becomes much harder to debug.

The fix is boring in a good way: make tool use schema-first. Define the tool contract once, validate at runtime, adapt provider-specific payloads at the edge, and fail closed when the request does not match the contract.

This matters because the minute an agent can create tickets, modify infra, touch customer data, or even just burn expensive API calls, loose contracts stop being a convenience problem and become a reliability problem.

Why this matters

Prompt-only tool descriptions are okay for demos, but production agents need stable boundaries. Reviewers need to know what a tool can accept. Operators need predictable failures. Security teams need confidence that a malformed request will not quietly mutate into a side effect.

Schema-first contracts solve a lot of that. They make tool use inspectable, testable, and easier to migrate when providers or downstream services change.

Architecture or workflow overview

Model planner -> Tool registry -> JSON Schema -> Runtime validator
                    |                               |
                    | valid                         | invalid
                    v                               v
                 Adapter layer ----------------> Structured error
                    |
                    v
                 Executor -> Audit log and metrics

flowchart LR
    A[Model planner] --> B[Tool registry]
    B --> C[JSON Schema or typed contract]
    C --> D[Runtime validator]
    D -->|valid| E[Adapter layer]
    D -->|invalid| F[Structured error]
    E --> G[Executor]
    G --> H[Audit log and metrics]
    F --> H

Planner decides which tool to call.
Registry describes allowed tools and their exact input shape.
Validator rejects malformed arguments before execution.
Adapter converts schema-safe arguments into provider specific calls.
Executor performs the side effect and records what happened.

Implementation details

Define the contract in one place

A minimal tool contract should be explicit about required fields, enums, bounds, and whether additional properties are allowed.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "CreateIncidentTicket",
  "type": "object",
  "additionalProperties": false,
  "required": ["service", "severity", "summary"],
  "properties": {
    "service": { "type": "string", "minLength": 2 },
    "severity": { "type": "string", "enum": ["sev1", "sev2", "sev3", "sev4"] },
    "summary": { "type": "string", "minLength": 10, "maxLength": 240 },
    "runbook_url": { "type": "string", "format": "uri" },
    "notify_slack": { "type": "boolean", "default": true }
  }
}

additionalProperties: false stops hallucinated fields from leaking into execution. Enums force the agent onto your operating vocabulary instead of its own.

Validate before any side effect

import Ajv from "ajv";
import addFormats from "ajv-formats";
import schema from "./schemas/create-incident-ticket.json";

const ajv = new Ajv({ allErrors: true, useDefaults: true, removeAdditional: false });
addFormats(ajv);
const validateCreateIncident = ajv.compile(schema);

export function parseToolCall(call) {
  if (call.name !== "create_incident_ticket") throw new Error(`unsupported tool: ${call.name}`);
  if (!validateCreateIncident(call.arguments)) {
    return { ok: false, error: "validation_failed", details: validateCreateIncident.errors ?? [] };
  }
  return { ok: true, args: call.arguments };
}

If validation fails, the executor never runs. That one rule removes a lot of bizarre downstream debugging.

Adapt clean arguments to the downstream API

from dataclasses import dataclass
from typing import Optional

@dataclass(frozen=True)
class CreateIncidentTicket:
    service: str
    severity: str
    summary: str
    runbook_url: Optional[str] = None
    notify_slack: bool = True

SEVERITY_MAP = {
    "sev1": "critical",
    "sev2": "high",
    "sev3": "medium",
    "sev4": "low",
}

def to_ticket_payload(cmd: CreateIncidentTicket) -> dict:
    return {
        "service_name": cmd.service,
        "priority": SEVERITY_MAP[cmd.severity],
        "title": cmd.summary,
        "references": [cmd.runbook_url] if cmd.runbook_url else [],
        "notify": {"slack": cmd.notify_slack},
    }

The adapter is where downstream API drift should live. The model-facing contract should stay stable.

Return machine-readable errors back to the model

$ tool-exec create_incident_ticket
validation_failed
- /severity must be equal to one of: sev1, sev2, sev3, sev4
- /summary must NOT have fewer than 10 characters

{
  "ok": false,
  "error": "validation_failed",
  "tool": "create_incident_ticket",
  "details": [
    { "path": "/severity", "message": "must be equal to one of the allowed values" },
    { "path": "/summary", "message": "must NOT have fewer than 10 characters" }
  ]
}

Tradeoffs

Approach	What feels nice at first	What breaks later	Better default
Free-form JSON arguments	Fast prototyping	Silent field drift, weak reviewability, awkward retries	Only for throwaway prototypes
Prompt-only tool instructions	Low setup cost	Models invent values and formats	Use only as extra guidance
Schema-first contracts	Slightly more upfront work	Some boilerplate	Best default for side effects
Schema + adapter + audit log	Highest discipline	More code to maintain	Best for production or shared agents

What went wrong

Trusting the provider SDK to validate enough

A lot of SDKs validate transport shape, not business rules. A request can be syntactically valid and still operationally wrong.

Using one schema for both planning and execution

Your model wants a stable interface. Your downstream service wants whatever odd field names it currently uses. Keep those separate.

Accepting best-effort coercion

Auto-coercing SEV-1 into sev1 looks helpful until the same layer starts normalizing values you should have rejected.

Pitfall: If tools accept arbitrary strings for dates, IDs, or environments, the model will eventually generate one that looks plausible and is wrong. Prefer enums, typed IDs, and narrow patterns.

Security note: Schema validation is not authorization. A perfectly valid tool call can still be unauthorized for the current user, environment, or workflow step.

Practical checklist

Define tool inputs once in JSON Schema, Zod, Pydantic, or a similarly typed format.
Reject unknown fields unless you have a very strong migration reason not to.
Keep the model-facing contract separate from the downstream API payload.
Return structured validation errors so the agent can repair bad calls.
Log validated arguments, execution result, and error category for every run.
Add auth checks after validation and before side effects.
Add replay-safe identifiers or idempotency keys for non-read tools.
Review every enum and default value like it is part of your public API.

References

Conclusion

If an agent can touch real systems, tool calling should look more like API design and less like wishful prompting.

Schema-first contracts add a little ceremony, but they buy calmer debugging, safer retries, tighter reviews, and fewer bizarre side effects. That trade is worth it.

AI Agents Tool Calling JSON Schema Reliability Developer Workflow