AI-generated infrastructure diffs are often convincing in exactly the wrong way. The Terraform looks tidy, the resources have reasonable names, and the PR summary sounds confident, but one small change can widen public access, remove encryption, or multiply cost before anyone notices.
That problem gets worse when reviewers only see a pretty diff. Infra failures are rarely about syntax. They are about side effects, defaults, blast radius, and the one missing guardrail that an agent did not know mattered in your environment.
What helps is a policy gate that evaluates the plan, not just the code. In this post I’ll show a simple workflow that turns AI-generated infra changes into evidence-backed review, with Terraform plan output, Conftest policies, exception handling, and a reviewer summary that surfaces what actually changed.
Why this matters
AI tools are very good at producing syntactically valid infrastructure code. They are much less reliable at preserving organization-specific intent such as “never expose Redis publicly,” “all buckets must have lifecycle rules,” or “production IAM changes require a human exception ticket.”
- a security group quietly adds
0.0.0.0/0 - an S3 bucket loses encryption or public-access blocking
- an RDS or cache node tier changes and doubles spend
- a module upgrade forces replacement on a stateful resource
- an IAM policy widens action scope because the generated least-privilege set was guessed wrong
Architecture or workflow overview
flowchart LR
A[AI-generated Terraform PR] --> B[terraform fmt and validate]
B --> C[terraform plan to JSON]
C --> D[policy engine checks plan evidence]
D --> E[exceptions and ownership rules]
E --> F[review summary with risk labels]
F --> G[human reviewer approves or blocks]
G --> H[merge and apply in controlled lane]The important shift is that the policy engine reads the evaluated plan, where computed values, replacements, and drift are visible, instead of trusting the source diff alone.
| Review layer | What it checks | Why AI-generated infra needs it |
|---|---|---|
| Syntax lane | fmt, validate, provider init | Catches broken output, not risky intent |
| Plan lane | adds, deletes, replacements, field values | Shows the real side effects of the change |
| Policy lane | org rules over plan JSON | Blocks unsafe but plausible diffs |
| Reviewer lane | ownership, exceptions, business context | Handles the cases policy cannot fully encode |
Implementation details
1) Convert the plan into evidence your policies can read
The source diff is not enough. Policy becomes much more reliable once it evaluates Terraform’s JSON plan.
# .github/workflows/infra-policy.yml
name: infra-policy
on:
pull_request:
paths:
- 'infra/**'
jobs:
policy-check:
runs-on: ubuntu-latest
defaults:
run:
working-directory: infra/prod
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Terraform init
run: terraform init -input=false
- name: Terraform plan
run: terraform plan -out=tfplan -input=false
- name: Export plan JSON
run: terraform show -json tfplan > tfplan.json
- name: Conftest policy gate
run: conftest test tfplan.json --policy ../../policy2) Write policies around dangerous outcomes, not stylistic preferences
I prefer a small set of hard-fail policies for security and cost, then softer reviewer warnings for everything else.
package terraform.security
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_security_group_rule"
resource.change.actions[_] == "create"
resource.change.after.type == "ingress"
resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
port := resource.change.after.from_port
port == 6379
msg := sprintf("redis ingress exposed to the internet in %s", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket_server_side_encryption_configuration"
resource.change.actions[_] == "delete"
msg := sprintf("bucket encryption removed for %s", [resource.address])
}Two things matter here. First, the rules target outcomes that are expensive to miss. Second, they stay readable enough that reviewers and platform engineers can audit them without turning the policy folder into a black box.
3) Summarize the risky changes for the human who has to approve them
A blocked check is useful. A blocked check plus a good summary is much better.
import json
from collections import Counter
plan = json.load(open("tfplan.json"))
counts = Counter()
replacements = []
for rc in plan.get("resource_changes", []):
actions = tuple(rc.get("change", {}).get("actions", []))
counts[actions] += 1
if actions == ("delete", "create"):
replacements.append(rc["address"])
print("Terraform plan summary")
print(f"create: {counts[('create',)]}")
print(f"update: {counts[('update',)]}")
print(f"delete: {counts[('delete',)]}")
print(f"replace: {counts[('delete', 'create')]}")
if replacements:
print("replacement targets:")
for addr in replacements:
print(f"- {addr}")$ python3 scripts/plan_summary.py
Terraform plan summary
create: 3
update: 2
delete: 0
replace: 1
replacement targets:
- aws_db_parameter_group.primary
risk labels: replacement, network, encryption-reviewedWhat went wrong and the tradeoffs
My least favorite failure mode here is false confidence. Teams add a policy step, see green checks, and assume the workflow is now safe. It is not safe unless the rules cover the outcomes that actually hurt you.
Plan trust, bypass sprawl, over-gating, and sensitive output leakage are the four ways good-looking policy workflows quietly stop protecting you.
| Choice | Upside | Downside | When I would use it |
|---|---|---|---|
| Conftest over Terraform plan JSON | Simple, inspectable, Git-friendly | You maintain the rules yourself | Best default for small to medium teams |
| Sentinel or hosted policy platform | Rich ecosystem and governance features | More tooling lock-in | Large Terraform-heavy organizations |
| Hard-fail on every policy warning | Strong safety posture | Slows delivery and breeds bypasses | Only for narrow high-risk controls |
| Warning plus owner review on some checks | Better developer experience | Requires disciplined reviewers | Good for cost and replacement signals |
What I would not do is gate on file-level regex checks alone. A diff that removes encryption or replaces a stateful resource can still look harmless in raw HCL if the semantic effect is buried in module behavior or defaults.
There is also a security angle that gets missed. If your AI assistant can read plan output, make sure the plan does not leak secrets, endpoint details, or sensitive tags into logs, PR comments, or public CI artifacts. Redaction and artifact retention rules matter here.
Practical checklist
Fail hard on a short list of truly dangerous outcomes, summarize the rest, and make every exception explicit, owned, and time-bounded.
- [ ] Run
terraform planin CI with pinned provider versions - [ ] Export plan JSON and evaluate policy against the plan, not just source files
- [ ] Hard-fail on exposure, encryption loss, destructive replacements, and forbidden IAM widening
- [ ] Add reviewer-facing summaries for replacements, cost shifts, and stateful resources
- [ ] Require named exceptions with ticket links for policy bypasses
- [ ] Keep policy rules readable enough for normal code review
- [ ] Redact or avoid sensitive values in CI logs and PR comments
- [ ] Apply only from a controlled post-merge lane, not from arbitrary PR automation
Conclusion
AI can help write infrastructure code quickly, but infrastructure review still needs semantic evidence. Once you gate on Terraform plan output, encode a few high-value policies, and hand reviewers a crisp summary, AI-generated infra changes become much less of a trust fall.