Mastering Terraform State: Real Incidents, Lessons, and Best Practices
Introduction: What is Terraform State?
Terraform doesn’t just apply your infrastructure code and forget about it. It keeps track of what’s been created — every EC2 instance, every S3 bucket, every RDS database, every security group, and many more. That data is stored in a file called terraform.tfstate. This state file is how Terraform knows:
- What resources exist in the actual world
- What their real-world values are (e.g., IP addresses, instance IDs)
- What’s changed since the last apply
Without this file, Terraform is blind, and so are we.
Why It’s So Important
Let’s say your team member deletes an S3 bucket directly from the AWS Console, and not through Terraform. The next time someone runs terraform apply, Terraform won’t know that the bucket is gone. It might try to recreate it — or worse, break other resources linked to it. Or let’s say two people run terraform apply at the same time. If you’re not locking the state file, you might end up with overlapping or duplicate infra. This stuff happens often. We’ve seen it.
What Can Go Wrong (From Real Experience)
1. Local State in Teams
- We once had a project where everyone was using local state on their laptops.
- Resources got overwritten, and the infrastructure was out of sync.
- Fix: We moved everything to an S3-backed remote state with DynamoDB locking. Peace restored. Now Terraform has launched a new feature in which locking happens in the S3 bucket itself. Native S3 State Locking – No More DynamoDB!
Before (Using DynamoDB for Locking)
terraform { backend "s3" { bucket = "test-bucket" key = "auth/shared/terraform.tfstate" region = "us-west-2" dynamodb_table = "test-table" } }
Now (Using S3-native Locking)
terraform { backend "s3" { bucket = "test-bucket" key = "auth/shared/terraform.tfstate" region = "us-west-2" use_lockfile = true } }
2. No Locking
- During a tight deadline, two folks triggered the same action at the same time.
- Resources clashed. One team’s RDS instance got deleted mid-apply.
- After that, we never skipped state locks again.
Terraform state lock
3. Secrets in State
- Terraform state can hold plain-text secrets like DB passwords and other sensitive secrets.
- We audited our state files and found sensitive values being output without sensitive = true.
- Fix: Be mindful of what you expose. Treat state files like secrets.
4. Don’t Git Commit Your State (Ever)
- It’s tempting — you see Terraform.tfstate lying around and think, “Let me just version control this.”
- Please don’t. Not only does it bloat your repo, but if secrets are in there (and they usually are), you’ve just leaked passwords to your whole team.
- Fix:
.gitignore
Your state files locally. Use.terraformignore
for modules too.Ignore State File
5. State Is Not Magic – Learn to Debug It
When things go wrong, the state file is your friend. If Terraform says “this resource doesn’t exist,” but you can see it in the AWS Console, chances are your state is outdated or broken.
You can import it using terraform import, or remove it using terraform state rm. Don’t be afraid of these concepts. Just take backups first! Want to dive deeper? Check out these detailed guides:
👉 No More Manual Terraform Imports — Learn the New Way
👉 Modern Terraform Practices: Removing Resources Safely from State
6. Watch Out for Large State Files
- As infra grows, your state files can balloon — especially if you’re storing hundreds of resources in a single file.
- This slows down operations and increases the blast radius.
- Fix: Break state into smaller logical units — per microservice, per team, or app.
Best Practices That Help
Here’s what we now follow religiously:
- Remote State in S3: Don’t store state on your laptop. Use an S3 bucket, and turn on versioning.
- Enable Locking with Native S3 State Locking: Prevents two people from updating the state at once. Simple, powerful.
- Separate State per Environment: Use workspaces or a folder-based structure to isolate dev, qa, integration, and prod.
- Encrypt and Audit: Enable encryption at rest in S3. Enable access logs. Monitor who touched the state.
- Backups Matter: If something breaks, S3 versioning lets us roll back to a previous state file.
A Small Incident That Taught Us a Big Lesson
We were once debugging a Jenkins EC2 setup that kept getting re-created. Turned out, someone had applied Terraform in a stale local branch with an old state file. It deleted a few resources before we stopped it.
That one incident made us rewrite our team’s Terraform README
and enforce the remote state.
Final Thoughts
You can have the best Terraform code in the world. But if your state file is a mess, things will break — silently and painfully.
So treat Terraform state like gold:
- Protect it
- Back it up
- Don’t share it carelessly
- Lock it before touching it
- It’s not just a file. It’s your infrastructure’s spine.
Want help setting this up for your team? At TO THE NEW, we’ve helped multiple teams migrate to proper Terraform state management and avoid disaster in production. Happy to share templates or war stories.