How to Test Terraform Infrastructure Code

Infrastructure as code has become a paradigm, but infrastructure scripts are often written and run only once. This works for simplistic infrastructure requirements (e.g. k8s deployments). But when there is a requirement for more varied infrastructure or greater resiliency then testing infrastructure code becomes a requirement. This blog post introduces a current project that has found tools and patterns to deal with this problem.

Introduction

I’ve been working on a project which consists of a large amount and a wide variety of infrastructure. Because of the size and rapid development, there is always a failing component. This has led to a complete lack of confidence in the code. We need tests.

The key problem is that infrastructure code is almost untestable. The only test we could perform was to manually spin up the entire cluster (or use a CI tool) and then check if there were any errors. The full cluster would take hours to set up due to some pretty nasty bootstrapping we needed for a legacy system. Even if there were no errors, there’s still no guarantee that it works. Thankfully there are now tools that can help.

Terratest: A Terraform Test Harness

Over a series of months we were searching github for a Terraform testing framework. In around April 2018 we got our first hit. Our friends at Gruntworks have developed a new tool called Terratest. The goal is to make Terraform code testable by using a unit testing framework. They chose to use Go’s unit testing framework (they could have picked anything) then developed Go libraries that performed common tasks like SSH-ing into a machine.

How to Use Terratest

To run a test you need to provide a Terraform file and a Go test. Then you simply call go test and Terratest takes care of initing and applying your infrastructure. It performs some tests then destroys. Of course, this isn’t magic. This process is encoded in the Go test file. But it serves as a neat repeatable pattern to test infrastructure.

Let’s take a look at a condensed example. More examples can be found in the examples directory of Terratest.

An Example of Testing Terraform Infrastructure

First we need to define our infrastructure in Terraform code. This contains all the usual stuff and the examples typically use AWS resources, but any cloud provider should work.

resource "aws_instance" "example" {
  ami           = "${data.aws_ami.ubuntu.id}"
  instance_type = "t2.micro"

  tags {
    Name = "${var.instance_name}"
  }
}

...

Next we use the new Terratest framework to develop our test in Go.

 terraformOptions := &terraform.Options{
  // The path to where our Terraform code is located
  TerraformDir: "../examples/terraform-http-example",

  // Variables to pass to our Terraform code using -var options
  Vars: map[string]interface{}{
   "aws_region":    awsRegion,
   "instance_name": instanceName,
   "instance_text": instanceText,
  },
 }

 // At the end of the test, run `terraform destroy` to clean up any resources that were created
 defer terraform.Destroy(t, terraformOptions)

 // This will run `terraform init` and `terraform apply` and fail the test if there are any errors
 terraform.InitAndApply(t, terraformOptions)

 // Run `terraform output` to get the value of an output variable
 instanceURL := terraform.Output(t, terraformOptions, "instance_url")

 // It can take a minute or so for the Instance to boot up, so retry a few times
 maxRetries := 30
 timeBetweenRetries := 5 * time.Second

 // Verify that we get back a 200 OK with the expected instanceText
    http_helper.HttpGetWithRetry(t, instanceURL, 200, instanceText, maxRetries, timeBetweenRetries)

This test initialises some options that are passed through to Terraform. Next we use Go’s defer to defer execution of a destroy function until the end of the current function in scope. Next the apply is performed. Finally we have some helper code from a Terratest library to repeatedly ping a webserver for a response. If it does not respond in the requested time, it will fail the test.

Terratest provides a whole range of Go modules to help with the testing. For example to SSH into a machine a run a command or to copy files or check that files exist or submit POST HTTP requests. There is scope for further helper functions however. Our first PR enabled the ability to SSH into a machine using ssh-agent, not a key.

How To Unit Test Terraform Modules

In larger projects, it is still painful to test. The main problem with developing infrastructure code is that the feedback cycle is so long. It can take hours to perform one iteration.

The most common method to solve a large problem is to split it into smaller problems that are solvable. In this case we use Terraform modules, which are intended to encapsulate smaller building blocks. The problem I have with Terraform modules is that they always end up looking like giant pass-through functions because you need the flexibility; i.e. most of the low-level configuration is still done by the parent code, which limits the power encapsulation.

But ignoring that for now, we can still write Terratests for the individual modules. These form our “unit tests” for Terraform code. We write a very simple main.tf that instantiates our module and write a Go test file to test that module. An example directory layout might look like this:

.
├── main.tf
├── modules
│   ├── bastion
│   │   └── main.tf
│   └── webserver
│       └── main.tf
└── tests
    └── unit
        ├── bastion
        │   ├── main.tf
        │   └── test.go
        └── webserver
            ├── main.tf
            └── test.go

Here we have a simplified setup where the main.tf in the root directory is the parent of the modules. In that file we import the bastion and webserver modules. The tests are located in the tests directory. For each module we have a unit test that consists of a main.tf and a test.go. The main.tf contains all the dependencies required to import that module and only that module. The test.go the contains the code to apply, test and destroy the module.

Finally we can run all this code in a CI pipeline with cd tests/unit ; go test -v ./... and we will obtain a global pass/fail.

Conclusion

This set up isolates smaller parts of the infrastructure and if you run them in parallel (Go’s testing framework can run in parallel) then this shortens the feedback cycle. Of course the tests are only as quick as the slowest test, so there is some impetus to speed up deployment times by baking static elements in to images.