Tool Capability Manifests for AI Agents That Need to Pick the Right Tool

Agents get weird when every tool looks equally safe.

That is the real failure mode behind a lot of bad tool decisions. A planner sees two options, both described with a name, a schema, and a happy-path description, then reaches for the tool that looks most directly useful.

What it usually cannot see is that one call is read-only and cheap while the other spends money, mutates state, pages production, or leaves a messy rollback path. This post is about fixing that missing layer with capability manifests.

Why this matters

Schema validation tells you whether a tool call is well formed. It does not tell you whether the tool call is wise.

That difference becomes expensive the moment an agent can touch GitHub, cloud APIs, incident tooling, or customer-facing systems. If the planner has no explicit capability metadata, it will overvalue the shortest path to an answer.

expensive tools get used for cheap questions
write tools get used where read tools would have been enough
rollback paths are discovered after the fact
reviewers cannot tell whether the tool catalog is actually safe

Architecture or workflow overview

flowchart TD
    A[User task] --> B[Planner]
    B --> C[Capability manifest registry]
    C --> D{{Score candidates}}
    D -->|low risk, cheap, read only| E[Call tool directly]
    D -->|write or high blast radius| F[Ask for approval or pick safer alternative]
    E --> G[Trace result with manifest metadata]
    F --> G
    G --> H[Memory, audit, retry policy]

A practical manifest registry usually sits next to the tool adapter layer, not inside the prompt alone. That lets the same metadata drive planning, approval gating, tracing, and incident review.

Implementation details

1) Define manifests with decision-grade metadata

I keep tool input schemas separate from operational metadata. The planner needs both, but they answer different questions.

name: github.create_pull_request
summary: Open a pull request in GitHub for an already-pushed branch
inputSchemaRef: ./schemas/github.create_pull_request.json
capabilities:
  mode: write
  sideEffects:
    external: true
    reversible: partial
    rollbackHint: close_pr_and_revert_branch
  auth:
    scope: repo
    humanApprovalRequired: true
  cost:
    lane: low
    estimatedDollars: 0.00
  latency:
    p50Ms: 900
    p95Ms: 2600
  reliability:
    idempotent: false
    retryClass: guarded
  observability:
    emitTraceAttrs:
      - repo
      - baseBranch
      - headBranch
  alternatives:
    - github.diff_branch
    - github.draft_pr_summary

The exact field names matter less than the discipline. Side effects, approval needs, reversibility, and retry class should not live as tribal knowledge.

2) Score candidates before the model commits

A small deterministic layer helps more than endlessly rewriting the planner prompt.

interface ToolCandidate {
  name: string;
  semanticFit: number;
  capability: {
    mode: 'read' | 'write';
    approvalRequired: boolean;
    estimatedDollars: number;
    p95Ms: number;
    reversible: 'full' | 'partial' | 'none';
  };
}

export function scoreTool(candidate: ToolCandidate, taskRisk: number) {
  let score = candidate.semanticFit * 100;

  if (candidate.capability.mode === 'write') score -= 18;
  if (candidate.capability.approvalRequired) score -= 10;
  if (candidate.capability.estimatedDollars > 0.10) score -= 8;
  if (candidate.capability.p95Ms > 3000) score -= 6;
  if (candidate.capability.reversible === 'none') score -= 12;

  score -= taskRisk * 5;
  return score;
}

This does not replace the model. It narrows the menu so the model stops picking tools that are technically valid but operationally dumb.

3) Push manifest metadata into tracing and review

{
  "trace_id": "3c8df2f9f2f64b5a",
  "tool": "github.create_pull_request",
  "tool_mode": "write",
  "approval_required": true,
  "retry_class": "guarded",
  "estimated_dollars": 0.0,
  "rollback_hint": "close_pr_and_revert_branch",
  "selected_over": [
    "github.diff_branch",
    "github.draft_pr_summary"
  ]
}

This is the part teams usually skip, and it is why later postmortems turn into guesswork. Good manifest data should survive past planning and stay visible in traces and approvals.

What went wrong and the tradeoffs

Failure mode 1, the manifest turns into a stale wiki

If manifests live only in docs, people stop updating them. Nothing breaks until a planner makes the wrong call in production.

Best practice: fail CI when a tool changes retry policy, auth scope, or side-effect lane without a corresponding manifest update.

Failure mode 2, everything gets labeled high risk

That keeps you safe for a week, then destroys tool selection quality because the planner cannot see meaningful gradients.

Approach	Benefit	Cost	When I would use it
Coarse risk labels only	Fast to launch	Too much ambiguity	Tiny catalogs with few write tools
Full capability manifests	Best planner quality and auditability	More maintenance	Shared agent platforms
Human-written per-task overrides	Precise for sensitive flows	Hard to scale	Deployments, finance, paging

Failure mode 3, side effects are hidden behind read tools

This is the sneaky one. A tool might look informational but still mutate state through caching, analytics writes, or implicit server-side sessions.

Pitfall: read-only should mean no user-visible or system-visible mutation worth auditing. If the backend writes durable state, label it honestly.

Security and reliability concern

Capability metadata is part of your trust boundary. If the model can rewrite the manifest, or if tool servers can self-report unchecked metadata at runtime, an attacker can downgrade a dangerous tool into a harmless-looking one.

store manifests in reviewed code or signed registry data
validate them server-side before exposure to the planner
separate tool description text from enforcement policy
bind approval UI to the same manifest source the runtime enforces

Practical checklist

[ ] classify every tool as read, write, or mixed
[ ] document irreversible or partially reversible side effects
[ ] record auth scope and whether human approval is required
[ ] add rough latency and cost lanes, even if hand-estimated
[ ] define retry class, especially for non-idempotent tools
[ ] expose safer alternatives in the manifest for planner comparison
[ ] emit manifest fields into traces and approval screens
[ ] add CI checks so manifest drift is visible

Conclusion

Tool schemas tell the agent how to call something. Capability manifests tell it whether calling that thing is a good idea.

If you want agents to make saner choices under cost, risk, and approval constraints, this is one of the highest-leverage layers you can add.