Agents get weird when every tool looks equally safe.
That is the real failure mode behind a lot of bad tool decisions. A planner sees two options, both described with a name, a schema, and a happy-path description, then reaches for the tool that looks most directly useful.
What it usually cannot see is that one call is read-only and cheap while the other spends money, mutates state, pages production, or leaves a messy rollback path. This post is about fixing that missing layer with capability manifests.
Why this matters
Schema validation tells you whether a tool call is well formed. It does not tell you whether the tool call is wise.
That difference becomes expensive the moment an agent can touch GitHub, cloud APIs, incident tooling, or customer-facing systems. If the planner has no explicit capability metadata, it will overvalue the shortest path to an answer.
- expensive tools get used for cheap questions
- write tools get used where read tools would have been enough
- rollback paths are discovered after the fact
- reviewers cannot tell whether the tool catalog is actually safe
Architecture or workflow overview
flowchart TD
A[User task] --> B[Planner]
B --> C[Capability manifest registry]
C --> D{{Score candidates}}
D -->|low risk, cheap, read only| E[Call tool directly]
D -->|write or high blast radius| F[Ask for approval or pick safer alternative]
E --> G[Trace result with manifest metadata]
F --> G
G --> H[Memory, audit, retry policy]A practical manifest registry usually sits next to the tool adapter layer, not inside the prompt alone. That lets the same metadata drive planning, approval gating, tracing, and incident review.
Implementation details
1) Define manifests with decision-grade metadata
I keep tool input schemas separate from operational metadata. The planner needs both, but they answer different questions.
name: github.create_pull_request
summary: Open a pull request in GitHub for an already-pushed branch
inputSchemaRef: ./schemas/github.create_pull_request.json
capabilities:
mode: write
sideEffects:
external: true
reversible: partial
rollbackHint: close_pr_and_revert_branch
auth:
scope: repo
humanApprovalRequired: true
cost:
lane: low
estimatedDollars: 0.00
latency:
p50Ms: 900
p95Ms: 2600
reliability:
idempotent: false
retryClass: guarded
observability:
emitTraceAttrs:
- repo
- baseBranch
- headBranch
alternatives:
- github.diff_branch
- github.draft_pr_summaryThe exact field names matter less than the discipline. Side effects, approval needs, reversibility, and retry class should not live as tribal knowledge.
2) Score candidates before the model commits
A small deterministic layer helps more than endlessly rewriting the planner prompt.
interface ToolCandidate {
name: string;
semanticFit: number;
capability: {
mode: 'read' | 'write';
approvalRequired: boolean;
estimatedDollars: number;
p95Ms: number;
reversible: 'full' | 'partial' | 'none';
};
}
export function scoreTool(candidate: ToolCandidate, taskRisk: number) {
let score = candidate.semanticFit * 100;
if (candidate.capability.mode === 'write') score -= 18;
if (candidate.capability.approvalRequired) score -= 10;
if (candidate.capability.estimatedDollars > 0.10) score -= 8;
if (candidate.capability.p95Ms > 3000) score -= 6;
if (candidate.capability.reversible === 'none') score -= 12;
score -= taskRisk * 5;
return score;
}This does not replace the model. It narrows the menu so the model stops picking tools that are technically valid but operationally dumb.
3) Push manifest metadata into tracing and review
{
"trace_id": "3c8df2f9f2f64b5a",
"tool": "github.create_pull_request",
"tool_mode": "write",
"approval_required": true,
"retry_class": "guarded",
"estimated_dollars": 0.0,
"rollback_hint": "close_pr_and_revert_branch",
"selected_over": [
"github.diff_branch",
"github.draft_pr_summary"
]
}This is the part teams usually skip, and it is why later postmortems turn into guesswork. Good manifest data should survive past planning and stay visible in traces and approvals.
What went wrong and the tradeoffs
Failure mode 1, the manifest turns into a stale wiki
If manifests live only in docs, people stop updating them. Nothing breaks until a planner makes the wrong call in production.
Best practice: fail CI when a tool changes retry policy, auth scope, or side-effect lane without a corresponding manifest update.
Failure mode 2, everything gets labeled high risk
That keeps you safe for a week, then destroys tool selection quality because the planner cannot see meaningful gradients.
| Approach | Benefit | Cost | When I would use it |
|---|---|---|---|
| Coarse risk labels only | Fast to launch | Too much ambiguity | Tiny catalogs with few write tools |
| Full capability manifests | Best planner quality and auditability | More maintenance | Shared agent platforms |
| Human-written per-task overrides | Precise for sensitive flows | Hard to scale | Deployments, finance, paging |
Failure mode 3, side effects are hidden behind read tools
This is the sneaky one. A tool might look informational but still mutate state through caching, analytics writes, or implicit server-side sessions.
Pitfall: read-only should mean no user-visible or system-visible mutation worth auditing. If the backend writes durable state, label it honestly.
Security and reliability concern
Capability metadata is part of your trust boundary. If the model can rewrite the manifest, or if tool servers can self-report unchecked metadata at runtime, an attacker can downgrade a dangerous tool into a harmless-looking one.
- store manifests in reviewed code or signed registry data
- validate them server-side before exposure to the planner
- separate tool description text from enforcement policy
- bind approval UI to the same manifest source the runtime enforces
Practical checklist
- [ ] classify every tool as read, write, or mixed
- [ ] document irreversible or partially reversible side effects
- [ ] record auth scope and whether human approval is required
- [ ] add rough latency and cost lanes, even if hand-estimated
- [ ] define retry class, especially for non-idempotent tools
- [ ] expose safer alternatives in the manifest for planner comparison
- [ ] emit manifest fields into traces and approval screens
- [ ] add CI checks so manifest drift is visible
Conclusion
Tool schemas tell the agent how to call something. Capability manifests tell it whether calling that thing is a good idea.
If you want agents to make saner choices under cost, risk, and approval constraints, this is one of the highest-leverage layers you can add.