Terraform modules for fun and profit

I’ve been using it for a while on a smaller scale but a recent project at work had me putting it through its paces and working out the issues I had. A few of my issues in the past have revolved around:

Scope

One of the mistakes I made when first using terraform was trying to “do too much” with it. Like many people, I wanted to see how much I could replace the “fragility” of existing CM-focused stacks (i.e. knife ec2 commands in a NOTES.md file) with something “better”.

As I used Terraform, I tried to put more and more logic into it. I ended up fighting trying to get things like data about the current launch into userdata. Once I scaled back the scope of what I tried to do with Terraform, it became much easier to use it “the right way”. Combined with Packer to build images with just a BIT more baked in, things got much easier.

If you can’t bake immutable amis for your stack (and I think that most non-trivial cases can’t yet), don’t try to fit application lifecycle into terraform. I would argue that fitting application lifecycle into terraform is too much for a few reason.

Safety

Terraform does its job really well. Too well sometimes. Imagine the following scenario:

You stand up your infrastructure with Terraform using the steps above. All is well. Now you need release a new version of your application.

Boom you’ve just updated yourself into an outage. Any change you make to your terraform code will cause terraform to destroy that resource and recreate it. Now terraform has the following:

1
2
3
lifecycle {
  create_before_destroy = true
}

but it does have limitations and you’re forcing everyone to not forget to add that. Honestly that doesn’t instill too much confidence that I won’t automate myself in the foot.

I realized the trick to terraform safety is to minimize the number of tf files you might have to modify as standard operation. By limiting its scope appropriately and through reuse/composability you can do this. About that reuse thing…

Composability/Reuse

And now we get to the part that will level up your Terraform game - using modules.

The problem is that Terraform modules are currently the least documented part of the Terraform ecosystem either in the official documentation or the community precisely because they are a bit difficult to grok. At the end I’m going to point you to some community resources that helped me figure it all out myself.

Outputs

The biggest thing that helped me understand modules was finally groking that outputs are NOT just for displaying at the end of a terraform apply or for use as variables in subshells with terraform output.

When using modules, outputs are essentially part of the public API to your module. Once I framed it this way I realized that Terraform modules are libraries except there are no methods to call.

Modules as Libraries

A module defines its contract in two ways:

Once you start moving some of your Terraform code into modules, you will be forced to create this contract for yourself. Let’s take an example from AWS that is highly reusable, VPC creation:

variables.tf

First we define our inputs in variables.tf:

1
2
3
4
5
variable "mymod_region" {}
variable "mymod_az" {}
variable "mymod_vpc_cidr" {}
variable "mymod_private_subnet" {}
variable "mymod_public_subnet" {}

We’re defining 5 inputs for this VPC all prefixed with mymod (more on that in a moment). This module is for creating a VPC that has two subnets - one for instances that get public IPs (like a nat instance) and one for private instances. We could add others here (and our internal vpc module actually provides for a subnet in each AZ of the region used). You can certainly provide sane defaults for these if the module is intended for specific internal use but I would say NOT to do that in the interest of forcing yourself to make it properly reusable.

outputs.tf

Now we define some outputs:

1
2
3
output "vpc_id" { value = "${aws_vpc.default.id}" }
output "private_subnet_id" { value = "${aws_subnet.private.id}" }
output "public_subnet_id" { value = "${aws_subnet.public.id}" }

This is where we return things in the format that most people are used to seeing in terraform (resource.name.attribute). At this point we now have a module that takes information in, does something and spits back out some data for us.

Using the module

Now let’s say we need to launch an instance into the public subnet and we want to use this VPC module. So we create a new terraform project/repo with the following files:

variables.tf

We’ll start here. You’ll notice we’re going to mix and match our variables for the instance as well as the ones needed by the VPC module. I’m not going to include all of the things I might make a variable in the interest of length:

1
2
3
4
5
variable "aws_region" { default = "us-west-2" }
variable "az" { default = "us-west-2a" }
variable "ami" {}
variable "vpc_cidr" {}
variable "private_subnet" {}

main.tf

1
2
3
4
5
6
7
8
9
provider "aws" { region = "${var.aws_region}" }
module "vpc" {
  source = "git::ssh://git@gitserver/org/my-tf-modules//vpc" // double slash intended. See terraform documentation
  mymod_region = "${var.aws_region}"
  mymod_az = "${var.az}"
  mymod_vpc_cidr = "${var.vpc_cidr}"
  mymod_private_subnet = "${var.private_subnet}"
  mymod_public_subnet = "${var.public_subnet}"
}

The reason I prefixed the inputs in the VPC module was to help with clarity. This is why it makes it (for me) easier to see where the flow of inputs and outputs is.

instance.tf

I’m going to truncate this code a bit only to show where we reuse the various variables:

1
2
3
4
resource "aws_instance" "node1" {
  ami = "${var.ami}"
  subnet_id = "${module.vpc.private_subnet_id}"
}

As you can see we’re using a bit of our local variables (the ami) with the variables from the VPC module. Obviously there are other things that are required here to actually launch that resource but you get the idea.

Some gotchas with modules

These aren’t hard and fast rules but they will help your sanity

Don’t nest modules

Looking above you might think that you could convert that into another module (an instance module which uses the VPC module). Don’t. Module variable/output visibility is from the ROOT module only. In the case of nested module usage, the root module is not the one you’re writing but the transitive module’s caller.

If you were to try and create a module that does these things you would end up needing to create inputs on each outer module to pass down to any modules IT calls as well as outputs to bubble the information all the way back up to the code from where you’re using the module.

i.e:

That becomes VERY cumbersome. Be very shallow with your module usage. Use a single wrapper project to pull in all the modules you need.

You can duplicate modules under different names

This is very handy (though cumbersome in terms of duplication of code). The following is from one of our internal “wrapper” project that launches a 3-node version of our stack for testing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
resource "aws_key_pair" "deployer" {
  key_name = "${var.orgname}-deployer-key"
  public_key = "${file("${var.orgname}-deployer-key.pub")}"
}

module "vpc" {
  source = "git::ssh://git@repo/org/tf-modules.git//vpc"
  spath_orgname = "${var.orgname}"
  spath_nat_ami = "${lookup(var.amis, var.aws_region)}"
  spath_nat_a_private_ip = "${var.nat_a_private_ip}"
  spath_nat_b_private_ip = "${var.nat_b_private_ip}"
  spath_nat_c_private_ip = "${var.nat_c_private_ip}"
  spath_public_subnet_a_cidr = "${var.public_subnet_a_cidr}"
  spath_public_subnet_b_cidr = "${var.public_subnet_b_cidr}"
  spath_public_subnet_c_cidr = "${var.public_subnet_c_cidr}"
  spath_private_subnet_a_cidr = "${var.private_subnet_a_cidr}"
  spath_private_subnet_b_cidr = "${var.private_subnet_b_cidr}"
  spath_private_subnet_c_cidr = "${var.private_subnet_c_cidr}"
  spath_vpc_cidr = "${var.vpc_cidr}"
  spath_keyname = "${var.orgname}-deployer-key"
  aws_region = "${var.aws_region}"
}

module "rds" {
  source = "git::ssh://git@repo/org/tf-modules.git//rds"
  spath_orgname = "${var.orgname}"
  iam_rds_user = "${var.iam_rds_user}"
  iam_rds_password = "${var.iam_rds_password}"
  spath_rds_sg_id = "${module.vpc.default_sg_id}"
  spath_rds_private_subnet_a_id = "${module.vpc.private_subnet_a_id}"
  spath_rds_private_subnet_b_id = "${module.vpc.private_subnet_b_id}"
  spath_rds_private_subnet_c_id = "${module.vpc.private_subnet_c_id}"
  aws_region = "${var.aws_region}"
}
module "dockernode_subnet_a" {
  source = "git::ssh://git@repo/org/tf-modules.git//dockerhost"
  spath_orgname = "${var.orgname}"
  dockernode_ami = "${lookup(var.amis, var.aws_region)}"
  dockernode_keypair_name = "${var.orgname}-deployer-key"
  dockernode_subnet_id = "${module.vpc.private_subnet_a_id}"
  dockernode_az = "${module.vpc.private_subnet_a_az}"
  dockernode_sg_id = "${module.vpc.default_sg_id}"
  aws_region = "${var.aws_region}"
}

module "dockernode_subnet_b" {
  source = "git::ssh://git@repo/org/tf-modules.git//dockerhost"
  spath_orgname = "${var.orgname}"
  dockernode_ami = "${lookup(var.amis, var.aws_region)}"
  dockernode_keypair_name = "${var.orgname}-deployer-key"
  dockernode_subnet_id = "${module.vpc.private_subnet_b_id}"
  dockernode_az = "${module.vpc.private_subnet_b_az}"
  dockernode_sg_id = "${module.vpc.default_sg_id}"
  aws_region = "${var.aws_region}"
}

module "dockernode_subnet_c" {
  source = "git::ssh://git@repo/org/tf-modules.git//dockerhost"
  spath_orgname = "${var.orgname}"
  dockernode_ami = "${lookup(var.amis, var.aws_region)}"
  dockernode_keypair_name = "${var.orgname}-deployer-key"
  dockernode_subnet_id = "${module.vpc.private_subnet_c_id}"
  dockernode_az = "${module.vpc.private_subnet_c_az}"
  dockernode_sg_id = "${module.vpc.default_sg_id}"
  aws_region = "${var.aws_region}"
}

You can see we’re reusing the “dockerhost” module three times under different names. This is because we want to launch a host in each AZ in the region. Terraform currently lacks logic for making that sort of balancing easy so we do it this way. Also note that we’re refering to other modules in when pass information into another module.

This is okay as this is our root module and it has that visibility.

Let the graph resolver work for you

If you can avoid it, don’t try to define dependencies yourself (i.e. using depends_on. You can’t use it in module blocks anyway). Instead use variable names that give Terraform the information it needs to build the dependency graph itself. By using ${module.vpc.private_subnet_c_id} as the subnet_id for dockernode_subnet_c, Terraform can infer the dependency itself and know that it needs to run that module FIRST before touching this one.

None of this addresses tfstate

Everything I’ve shown so far doesn’t address reusing STATE from a terraform run. These modules are great if you want to provide reusable code that people can use for standing up VPCs but what if you want to share the VPC itself.

Recently terraform has added support for “remote state”. This allows you to pull in the outputs of another terraform run so that it can be used elsewhere.

If you recall earlier I mentioned that minimizing the need to touch terraform files after initial run helps to cut down on the risk of inadvertant destruction of resources. By using the remote state functionality you can provision your main VPC and refer to its IDs in other terraform code without having to touch the base VPC code.

Mind you this does mean that the modifier of the VPC could could accidentally wipe infrastructure but this is a step in the right direction.

Don’t forget about null_resource

While it doesn’t participate in the graph fully, the null_resource can come in really handy. It’s currently undocumented but you can see an example of usage here.

The null_resource allows you to attach to run provisioners outside of a formal resource (such as an instance). You’ll probably have to be explicit with dependencies when using the null resource but the limited to scope for it should be okay.

paths, paths and more paths

You may notice that our vpc module above makes reference to nat nodes. Our model for VPC usage is one Nat instance per AZ. The nat instance creation, however, is scoped inside the VPC module. It also happens to use a remote-exec provisioner (which requires the SSH key that we create as part of the wrapping terraform scripts).

1
2
3
4
5
6
7
8
9
10
11
12
resource "aws_instance" "nat_a" {
  // elided
  provisioner "remote-exec" {
    inline = [ "printf '${template_file.iptables_config.rendered}' > /tmp/iptables.sav" ]
    connection {
        type = "ssh"
        user = "admin"
        key_file = "${path.cwd}/${var.spath_keyname}"
        agent = false
    }
  }
}

Here we define the path to the key file as being in path.cwd. This means we can create the ssh key in a wrapper terraform script, dump it to the current directory and the nat instance creation will know how to find it. There are other path variables availble (from the TF documentation

To reference path information, the syntax is path.TYPE. TYPE can be cwd, module, or root. cwd will interpolate the cwd. module will interpolate the path to the current module. root will interpolate the path of the root module. In general, you probably want the path.module variable.

Modules are vendored

When you use modules, the first thing you’ll have to do is terraform get. This pulls modules into the .terraform directory. Once you do that, unless you do another terraform get -update=true, you’ve essentially vendored those modules. This is nice as you know an upstream change won’t immediately invalidate and destroy your infra. I would suggest checking those into your wrapper repo.

Wrap up

I hope this has helped you with using terraform a bit. One of the best resources I found on module usage that really helped things click was the following awesome blogpost from Tom Doran (@bobtfish) on the subject. He’s made some really useful github repos demonstrating a bunch of stuff. Additionally the work done by Brandon Burton here can give good examples as well.