One easy pick (so I thought)

At work, most (if not all) infrastructure is deployed with Terraform, clicky-clicky actions are not at all encouraged and I got asked about the possibility of generating a new TLS certificate to forward users from a given domain to the main one.

Well, it will be an easy ride! the certificate is already created and validated with Terraform. A few tweaks in the module I will quickly be done with it.

Created a new variable so that we can provide the values that we need on the environment-specific code:

variable "secondary_domains" {
  type        = list(string)
  description = "Secondary domains to create the certificate for"
  default     = []
}

Updated an existing one that generates the environment-specific alternative names to include it

env_alternative_names = var.primary_env == "true" ? flatten(["*.${var.domain}", var.secondary_domains]) : ["*.${var.environment}.${var.domain}"]

Also squeezed in a fix to suppress a deprecation warning, merged to master, and got a new release, v1.3.0.

  • Applied to dev … Cool, working fine
  • Applied to Staging … all roses and rainbows!
  • Applied to pro … Certificate validation Failed! Crap!! I Completely missed this one, should have seen this one coming!

The good thing about having versioned modules and good life-cycle policies on the resources?

Rolled back to v1.2.0, all was back as it was before, with no disruptions, and got back to the drawing board!

And … Yes! This was in Production!!! cool isn’t it?

So, Now I have a problem! How do I get this to work?

The problem:

Now I have Multiple domains to include in a certificate and to apply the DNS Validation records on multiple AWS route53 Hosted zones and since I am at it it should be dynamic enough to add as many as we need in the future.

This is quite a common practice, so I am sure someone else has figured it out!

It is time to dress in the Kimono and practice all the lessons from the ancient masters in the art of google-fu!

It didn’t take long to find a module that does exactly what I wanted ringanta/terraform-aws-acm-multiple-hosted-zone

However, this does not entirely fit our design, I cannot just grab this module and use it, but the good thing about open source? It is open!

That being said, I still spent a good amount of time looking into other approaches and ultimately ended up deep diving into the one above!

I always aim for simplicity. But after some playing back and forth with variables ended up with the most complex solution I have implemented on Terraform yet (I believe) clean and elegant enough to be understood!

The Solution

Some things might not make sense as I go through them, but as in any good movie, everything unfolds and fits nicely together, so please bear with me for a sec!

1 - Generate the environment-specific alternative name (we were doing this before).

env_alternative_name = var.primary_env == "true" ? "*.${var.domain}" : "*.${var.environment}.${var.domain}"

2 - Update the alternative_domains variable and allow module users to provide a list with all the domain => zone mappings

variable "alternative_domains" {
  type        = list(map(string))
  description = "Secondary domains to create the certificate for"
  default     = []
}

3 - Create a list with all the alternative names for the certificate.

For this, I made the local.env_alternative_name part of a list with a single item then, a loop through the domains inside the alternative_domains variable creates another list with the alternative domains provided by the module input and finally concatenate all of them together into a single list!

all_alternative_names = concat(
    [local.env_alternative_name],
    [for an in var.alternative_domains : an.domain]
  )

3 - create a list of all the domains in the correct order. This one follows the same logic as before, but this time the first element needs to be the primary domain of the environment, then the environment-specific alternative name, and then all the provided alternative domains.

all_domains = concat(
    [local.env_domain],
    [local.env_alternative_name],
    [for d in var.alternative_domains : d.domain]
  )

4 - create a list of all the hosted zones in the correct order, following the same logic as before, first the primary domain hosted zone then the environment-specific hosted zone, and then the hosted zones for each alternative domain provided externally into the alternative_domains variable.

all_zones = concat(
    [var.zone_name],
    [var.zone_name],
    [for z in var.alternative_domains : z.zone]
  )

5 - create a list of all unique zones using the distinct() function.

distinct_zones = distinct(local.all_zones)

6 - Create a new variable that maps each distinct DNS Hosted zone with their respective IDs. These IDs are unique per hosted zone and will be needed later to create validation DNS Records at a later stage.

zone_name_to_id_map = zipmap(local.distinct_zones, data.aws_route53_zone.acm_zones[*].zone_id)

Each zone ID is then grabbed with a data source

data "aws_route53_zone" "acm_zones" {
  count = length(local.distinct_zones)
  name         = local.distinct_zones[count.index]
  private_zone = false
}

7 - create a variable that maps each Domain to their respective DNS Hosted zones.

domain_to_zone_map = zipmap(local.all_domains, local.all_zones)

What kind of sorcery is going on here? remember all those lists we have created before?

Let’s pick up one of the lists from before and understand how it works.

So the list all_domains Will include the following example values:

["domain.com", "*.domain.com", "alternative.com", "www.alternative.com"]

the list all_zones will contain the following example values:

["domain.com", "domain.com", "alternative.com", "alternative.com"]

Now, from the terraform documentation,

zipmap, constructs a map from a list of keys and a corresponding list of values.

Remember what I said before about needing them in order? this is the reason why!

when using the function domain_to_zone_map = zipmap(local.all_domains, local.all_zones) we are creating a map called domain_to_zone_map that will contain the following example values:

{
"domain.com" = "domain.com",
"*.domain.com" = "domain.com",
"alternative.com" = "alternative.com",
"www.alternative.com" = "alternative.com"
}

At this point I am done with manipulating variables and sorting all the pieces of the puzzle, here is how it looks all together:

variables.tf

locals {

  ... Other Local variables ...

  env_alternative_name = var.primary_env == "true" ? "*.${var.domain}" : "*.${var.environment}.${var.domain}"

  all_alternative_names = concat(
    [local.env_alternative_name],
    [for an in var.alternative_domains : an.domain]
  )

  all_domains = concat(
    [local.env_domain],
    [local.env_alternative_name],
    [for d in var.alternative_domains : d.domain]
  )

  all_zones = concat(
    [var.zone_name],
    [var.zone_name],
    [for z in var.alternative_domains : z.zone]
  )

  distinct_zones      = distinct(local.all_zones)
  zone_name_to_id_map = zipmap(local.distinct_zones, data.aws_route53_zone.acm_zones[*].zone_id)
  domain_to_zone_map  = zipmap(local.all_domains, local.all_zones)

}

Making it work!

Let’s put it all into action and take it one bit at a time again.

1 - Create the TLS certificate, providing the environment domain name and all the alternative domain names

# certificate creation
resource "aws_acm_certificate" "cert" {
  domain_name               = local.env_domain
  subject_alternative_names = local.all_alternative_names
  validation_method         = "DNS"

  lifecycle {
    create_before_destroy = true
  }
}

2 - Create the necessary DNS records to validate the certificate generated above.

  • Loop through each domain validation option and assign it to loop internal variables.
  • Create the records with that data using the mappings generated before to know exactly which hosted zone ID to use.
# certificate validation
resource "aws_route53_record" "cert_validation" {
  depends_on = [aws_acm_certificate.cert]

  for_each = {
    for dvo in aws_acm_certificate.cert.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
      domain = dvo.domain_name
    }
  }

  allow_overwrite = true
  name            = each.value.name
  type            = each.value.type
  records         = [each.value.record]
  zone_id         = lookup(local.zone_name_to_id_map, lookup(local.domain_to_zone_map, each.value.domain))
  ttl             = 60
}

3 - And finally we loop through all those records and validate the certificate

resource "aws_acm_certificate_validation" "cert" {
  depends_on              = [aws_acm_certificate.cert, aws_route53_record.cert_validation]
  certificate_arn         = aws_acm_certificate.cert.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

Here is all of it pieced together (including the Data part that was mentioned above)

# certificate creation
resource "aws_acm_certificate" "cert" {
  domain_name               = local.env_domain
  subject_alternative_names = local.all_alternative_names
  validation_method         = "DNS"

  lifecycle {
    create_before_destroy = true
  }
}

# certificate validation
resource "aws_route53_record" "cert_validation" {
  depends_on = [aws_acm_certificate.cert]

  for_each = {
    for dvo in aws_acm_certificate.cert.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
      domain = dvo.domain_name
    }
  }

  allow_overwrite = true
  name            = each.value.name
  type            = each.value.type
  records         = [each.value.record]
  zone_id         = lookup(local.zone_name_to_id_map, lookup(local.domain_to_zone_map, each.value.domain))
  ttl             = 60
}

resource "aws_acm_certificate_validation" "cert" {
  depends_on              = [aws_acm_certificate.cert, aws_route53_record.cert_validation]
  certificate_arn         = aws_acm_certificate.cert.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

4 - Last but not least, I added a new Load Balancer rule to redirect all the provided var.alternative_domains to the correct main one.

resource "aws_alb_listener_rule" "redirect_alternative_domains" {
  count = length(var.alternative_domains) > 0 ? 1 : 0
  listener_arn = aws_alb_listener.listener_443.arn
  priority     = 97
  action {
    type             = "redirect"
    target_group_arn = aws_alb_target_group.dashboard2_target_group.id
    redirect {
      host        = local.env_domain
      status_code = "HTTP_302"
    }
  }
  condition {
    host_header {
      values = [for ad in var.alternative_domains : ad.domain]
    }
  }
}

Extra points if you know what’s wrong with the code above! Don’t know? read more about it here

After all of This, committed everything and ran a couple of plans on other environments to make sure there were no unexpected surprises and it would not be changing any environment not requiring this change. Then proceeded to run a plan against Prod to check if the changes were the ones I was expecting.

Pushed everything to my feature branch, double-checked the pipeline for anything that I might have missed, and shared it with my team. After a quick review and discussion we got it merged to master, the pipelines made their magic, and v1.3.1 was released!

Applying this to production was smooth sailing! no errors and now we can have any alternative domains redirecting to the main one by providing a list with the mapping of all the alternative domains and their hosted zone name.

alternative_domains = [
    {
        domain = "alternative.com"
        zone   = "alternative.com"
    },
    {   domain = "www.alternative.com"
        zone   = "alternative.com"
    },
    {   domain = "alternative2.com"
        zone   = "alternative2.com"
    }
  ]

I have seen a couple of different approaches to this one around the internet, but none of them made much sense to me. There are better ones? Probably there are! But despite its complexity, this solution seemed quite elegant and easy enough to understand without risking the introduction of any breaking changes in the module!

Do you have any other ideas? How would you have solved this? feel free to reach out and let me know.