Most teams do not need another lecture about AI safety. They need a workflow that lets an agent do useful work without quietly turning a repo, inbox, or production system into a trust exercise.
That is the real approval problem. If approvals are too loose, the agent gets broad write power and mistakes compound fast. If approvals are too heavy, the agent becomes a glorified autocomplete with extra latency.
The sweet spot is a staged workflow: preview first, show the exact effect, require approval only for actions with real blast radius, then keep a clean audit trail. This is the pattern I would use for code-writing agents, repo workers, and tool-calling assistants.
Why this matters
Approval design becomes important the moment an agent can do more than read. Opening a pull request, modifying files, sending a message, merging a branch, or rotating a token all have different risk profiles. Treating them as one generic confirmation step usually fails both security and usability.
- what exactly is the agent asking to do
- can a human inspect the real effect before it happens
- does approval cover only this action, or a whole session
- what evidence exists after the fact
Architecture and workflow overview
A good approval path separates planning from execution and binds approval to a specific proposed action, not a vague intention.
The approval model I would actually use
Lane 1: auto-execute for read-only and reversible low-risk actions
Examples include listing issues, reading files inside an allowed workspace, inspecting CI logs, and rendering previews in temp directories. These actions should not ask humans to click a button every time.
Lane 2: preview-bound approval for bounded writes
Examples include editing files in an allowed repo, pushing a branch, opening a draft PR, or updating a scheduled task inside a limited namespace. These are the best candidates for human-in-the-loop approval because the real delta can be shown first.
Lane 3: high-friction approval for destructive or external actions
Examples include merging to a protected branch, deleting data, sending email or public messages, changing infra, or running elevated commands. These should require a more explicit step than a generic approve button.
Implementation detail 1: classify risk before asking for approval
The approval step should not decide risk at the UI layer. Risk classification belongs in policy code.
const policyTable = {
read_file: { lane: 'auto', effect: 'read' },
list_pull_requests: { lane: 'auto', effect: 'read' },
edit_workspace_file: { lane: 'preview', effect: 'write' },
create_pull_request: { lane: 'preview', effect: 'write' },
merge_pull_request: { lane: 'high', effect: 'write' },
send_email: { lane: 'high', effect: 'external' },
} as const;
function classifyAction(toolName, args) {
const policy = policyTable[toolName];
if (!policy) throw new Error(`Tool ${toolName} is not approved`);
return {
lane: policy.lane,
effect: policy.effect,
fingerprint: stableHash({ toolName, args }),
};
}Implementation detail 2: approve the exact artifact, not a broad session
One bad pattern is session-wide approval like “allow the agent to keep making changes.” It feels convenient, but it quietly converts a narrow review into an open-ended permission grant.
type ApprovalRequest = {
toolName: string;
args: unknown;
summary: string;
fingerprint: string;
expiresAt: string;
};
function mintApproval(request, approverId) {
return signJwt({
sub: approverId,
toolName: request.toolName,
fingerprint: request.fingerprint,
expiresAt: request.expiresAt,
});
}If the diff changes, the command changes, or the recipient changes, the fingerprint changes and the approval should be invalid.
Implementation detail 3: previews need to be inspectable, not poetic
A lot of agent systems generate approval summaries that sound nice but hide the real action. That is risky.
Approval requested: create_pull_request Repo: negiadventures/negiadventures.github.io Branch: ai-blog/2026-04-15-ai-agent-approval-workflows Base: master Files changed: - blog/ai-agent-approval-workflows.html - blog/ai-agent-approval-workflows.md - blog/index.html - blog/.ai-topic-history.json - sitemap.xml Diff summary: add one new blog post, prepend blog card, add sitemap entry, record topic history Network access: required Elevated access: no Approval expires in: 15 minutes
Implementation detail 4: make execution fail closed after approval
An approval system should re-check policy at execution time. Do not assume that a once-approved request remains valid forever.
def execute_with_policy(request):
classification = classify_action(request.tool_name, request.args)
if classification["lane"] == "auto":
return run_tool(request.tool_name, request.args)
if request.approval_token is None:
raise PermissionError("Approval required before execution")
assert_approval_matches(
request.approval_token,
request.fingerprint,
request.tool_name,
)
return run_tool(request.tool_name, request.args)Tradeoffs: the three common approval designs
| Design | Good at | Breaks when | My take |
|---|---|---|---|
| Blanket session approval | fast iteration | agent scope drifts, humans stop reading | only acceptable in tightly sandboxed personal workflows |
| Per-tool approval | simple mental model | frequent interruptions, approval fatigue | decent baseline, but noisy |
| Preview-bound approval | inspectable writes, tighter audit trail | preview generation is weak or fingerprints are unstable | best default for repo and ops workflows |
What goes wrong in practice
Failure mode 1: approvals become meaningless because they are too frequent
If every harmless read or reversible action asks for confirmation, people stop reading and start clicking. That is not human oversight, that is ritual.
Failure mode 2: the preview is not the real action
If the preview says two files but the execution can still modify five, approval is mostly theater.
Failure mode 3: approval accidentally covers follow-up actions
The agent edits files, then pushes, then opens a PR, then posts to chat because all of those feel related. They are related, but they are not the same action.
Failure mode 4: policy lives only in prompts
If the rule is “never message external recipients without approval,” that rule needs to exist in executable policy, not just in an instruction file.
Security and reliability concerns that matter
Replay resistance
Approval tokens should expire quickly and bind to a single action fingerprint. Otherwise a captured token can be reused for a modified request.
Workspace and tenant boundaries
For code-writing agents, approval alone is not enough. The runtime should still enforce allowed repos, writable paths, branch prefixes, and network restrictions.
Auditability
You want a durable event for request, approval, execution, and result. That matters for debugging just as much as compliance.
{
"timestamp": "2026-04-15T12:02:00Z",
"session_id": "sess_blog_412",
"tool": "create_pull_request",
"lane": "preview",
"fingerprint": "fp_8f6f0b",
"approver": "user_123",
"status": "executed",
"files_changed": 5,
"external_destination": "github.com",
"duration_ms": 912
}Idempotency
Approved writes should be safe to retry when possible. If a network call times out after approval, the system needs a way to avoid creating duplicate PRs or duplicate notifications.
What I would not do
- I would not give a coding agent permanent approval to run arbitrary commands in a repo just because it usually behaves.
- I would not let approve-once silently upgrade into approve-all-follow-up-actions.
- I would not build external message sending and local file editing behind the same generic confirm button.
Practical checklist
- classify tools into auto, preview, and high-risk lanes
- generate a concrete preview artifact before every bounded write
- bind approval to a stable fingerprint of the exact action
- expire approvals quickly
- re-check policy at execution time
- keep follow-up actions separate unless they are explicitly approved too
- log request, approval, execution, and result as distinct events
- enforce branch, path, repo, and destination constraints outside the prompt
- make read-only actions silent enough that people keep paying attention when approvals do appear
- fail closed when preview, token, or policy state no longer matches
Conclusion
Approval workflows are not there to make agents feel supervised. They are there to preserve velocity without losing inspectability.
The best pattern I know is boring in a good way: classify risk, preview the exact effect, approve only what can be inspected, execute only what was approved, and leave a clean trail behind.