Most agent failures I see are not model failures. They are contract failures.
The model picks the right tool in spirit, then sends one field with the wrong shape, one enum value with the wrong casing, or one half-parsed date that slips into a real side effect. If your executor is permissive, the bug moves downstream and becomes much harder to debug.
The fix is boring in a good way: make tool use schema-first. Define the tool contract once, validate at runtime, adapt provider-specific payloads at the edge, and fail closed when the request does not match the contract.
This matters because the minute an agent can create tickets, modify infra, touch customer data, or even just burn expensive API calls, loose contracts stop being a convenience problem and become a reliability problem.
Why this matters
Prompt-only tool descriptions are okay for demos, but production agents need stable boundaries. Reviewers need to know what a tool can accept. Operators need predictable failures. Security teams need confidence that a malformed request will not quietly mutate into a side effect.
Schema-first contracts solve a lot of that. They make tool use inspectable, testable, and easier to migrate when providers or downstream services change.
Architecture or workflow overview
Model planner -> Tool registry -> JSON Schema -> Runtime validator
| |
| valid | invalid
v v
Adapter layer ----------------> Structured error
|
v
Executor -> Audit log and metricsflowchart LR
A[Model planner] --> B[Tool registry]
B --> C[JSON Schema or typed contract]
C --> D[Runtime validator]
D -->|valid| E[Adapter layer]
D -->|invalid| F[Structured error]
E --> G[Executor]
G --> H[Audit log and metrics]
F --> H- Planner decides which tool to call.
- Registry describes allowed tools and their exact input shape.
- Validator rejects malformed arguments before execution.
- Adapter converts schema-safe arguments into provider specific calls.
- Executor performs the side effect and records what happened.
Implementation details
Define the contract in one place
A minimal tool contract should be explicit about required fields, enums, bounds, and whether additional properties are allowed.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "CreateIncidentTicket",
"type": "object",
"additionalProperties": false,
"required": ["service", "severity", "summary"],
"properties": {
"service": { "type": "string", "minLength": 2 },
"severity": { "type": "string", "enum": ["sev1", "sev2", "sev3", "sev4"] },
"summary": { "type": "string", "minLength": 10, "maxLength": 240 },
"runbook_url": { "type": "string", "format": "uri" },
"notify_slack": { "type": "boolean", "default": true }
}
}additionalProperties: false stops hallucinated fields from leaking into execution. Enums force the agent onto your operating vocabulary instead of its own.
Validate before any side effect
import Ajv from "ajv";
import addFormats from "ajv-formats";
import schema from "./schemas/create-incident-ticket.json";
const ajv = new Ajv({ allErrors: true, useDefaults: true, removeAdditional: false });
addFormats(ajv);
const validateCreateIncident = ajv.compile(schema);
export function parseToolCall(call) {
if (call.name !== "create_incident_ticket") throw new Error(`unsupported tool: ${call.name}`);
if (!validateCreateIncident(call.arguments)) {
return { ok: false, error: "validation_failed", details: validateCreateIncident.errors ?? [] };
}
return { ok: true, args: call.arguments };
}If validation fails, the executor never runs. That one rule removes a lot of bizarre downstream debugging.
Adapt clean arguments to the downstream API
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class CreateIncidentTicket:
service: str
severity: str
summary: str
runbook_url: Optional[str] = None
notify_slack: bool = True
SEVERITY_MAP = {
"sev1": "critical",
"sev2": "high",
"sev3": "medium",
"sev4": "low",
}
def to_ticket_payload(cmd: CreateIncidentTicket) -> dict:
return {
"service_name": cmd.service,
"priority": SEVERITY_MAP[cmd.severity],
"title": cmd.summary,
"references": [cmd.runbook_url] if cmd.runbook_url else [],
"notify": {"slack": cmd.notify_slack},
}The adapter is where downstream API drift should live. The model-facing contract should stay stable.
Return machine-readable errors back to the model
$ tool-exec create_incident_ticket validation_failed - /severity must be equal to one of: sev1, sev2, sev3, sev4 - /summary must NOT have fewer than 10 characters
{
"ok": false,
"error": "validation_failed",
"tool": "create_incident_ticket",
"details": [
{ "path": "/severity", "message": "must be equal to one of the allowed values" },
{ "path": "/summary", "message": "must NOT have fewer than 10 characters" }
]
}Tradeoffs
| Approach | What feels nice at first | What breaks later | Better default |
|---|---|---|---|
| Free-form JSON arguments | Fast prototyping | Silent field drift, weak reviewability, awkward retries | Only for throwaway prototypes |
| Prompt-only tool instructions | Low setup cost | Models invent values and formats | Use only as extra guidance |
| Schema-first contracts | Slightly more upfront work | Some boilerplate | Best default for side effects |
| Schema + adapter + audit log | Highest discipline | More code to maintain | Best for production or shared agents |
What went wrong
Trusting the provider SDK to validate enough
A lot of SDKs validate transport shape, not business rules. A request can be syntactically valid and still operationally wrong.
Using one schema for both planning and execution
Your model wants a stable interface. Your downstream service wants whatever odd field names it currently uses. Keep those separate.
Accepting best-effort coercion
Auto-coercing SEV-1 into sev1 looks helpful until the same layer starts normalizing values you should have rejected.
Security note: Schema validation is not authorization. A perfectly valid tool call can still be unauthorized for the current user, environment, or workflow step.
Practical checklist
- Define tool inputs once in JSON Schema, Zod, Pydantic, or a similarly typed format.
- Reject unknown fields unless you have a very strong migration reason not to.
- Keep the model-facing contract separate from the downstream API payload.
- Return structured validation errors so the agent can repair bad calls.
- Log validated arguments, execution result, and error category for every run.
- Add auth checks after validation and before side effects.
- Add replay-safe identifiers or idempotency keys for non-read tools.
- Review every enum and default value like it is part of your public API.
References
- OpenAI, structured outputs
- Anthropic, tool use
- JSON Schema specification
- Ajv JSON Schema validator
- Pydantic
Conclusion
If an agent can touch real systems, tool calling should look more like API design and less like wishful prompting.
Schema-first contracts add a little ceremony, but they buy calmer debugging, safer retries, tighter reviews, and fewer bizarre side effects. That trade is worth it.