Dynamic Route53 records for AWS Auto Scaling Groups with Terraform

AWS Auto Scaling Groups may seem outdated in a world dominated by Serverless and Kubernetes, but they still have their place in Meltwater’s AWS infrastructure.

One thing we felt was missing in Auto Scaling Groups are unique instance names. EC2 instances launched in the ASG are given the same Name tag, with no internal Route53 DNS entry. We have addressed this issue with an easy-to-use Terraform module called asg-dns-handler.

Why do we need unique ASG instance names?

As mentioned, AWS Auto Scaling Groups (ASGs) may seem outdated in a world dominated by Serverless and Kubernetes. However ASGs are still important to us for services that may not be easy to containerize, or where container orchestration is not desired.

We need our ASG instances to have unique names for a few reasons:

  • Access: When troubleshooting, we save time not having to look up the instance’s internal IP address for SSH access.
  • Logging: All of our system logs are forwarded to an internal ELK service. While Instance IDs can be used to uniquely identify which logs came from a certain system, we find it more helpful to filter logs by host/instance name.
  • Metrics: Instances can run tools like Node Exporter and have their metrics automatically scraped by Prometheus. We can then leverage off-the-shelf Grafana dashboards that expect unique hostnames (like this one) without modification.

We found some different approaches to solve this problem, but none of them used Terraform, our preferred infrastructure-as-code solution.

Our implementation addresses ASGs lack of unique instance names with a combination of Lambda, SNS and ASG Lifecycle Hooks, all packaged in an easy-to-use Terraform module called asg-dns-handler. As a Terraform module we can reuse this capability in all of our Auto Scaling Groups without code duplication, and share the functionality with other teams.

As we figured that others will find such capabilities useful too, we have released the module as open source.

How to use our asg-dns-handler

If you are already using Terraform to manage your Auto Scaling Group and internal Route53 zone, incorporating this module will be easy.

Let’s look at an example Terraform configuration:

resource "aws_autoscaling_group" "my_asg" {
  name = "myASG"

  vpc_zone_identifier = var.aws_subnets

  min_size         = var.asg_min_count
  max_size         = var.asg_max_count
  desired_capacity = var.asg_desired_count

  launch_configuration = aws_launch_configuration.my_launch_config.name

  initial_lifecycle_hook {
    name                    = "lifecycle-launching"
    default_result          = "CONTINUE"
    heartbeat_timeout       = 60
    lifecycle_transition    = "autoscaling:EC2_INSTANCE_LAUNCHING"
    notification_target_arn = module.autoscale_dns.autoscale_handling_sns_topic_arn
    role_arn                = module.autoscale_dns.agent_lifecycle_iam_role_arn
  }

  initial_lifecycle_hook {
    name                    = "lifecycle-terminating"
    default_result          = "CONTINUE"
    heartbeat_timeout       = 60
    lifecycle_transition    = "autoscaling:EC2_INSTANCE_TERMINATING"
    notification_target_arn = module.autoscale_dns.autoscale_handling_sns_topic_arn
    role_arn                = module.autoscale_dns.agent_lifecycle_iam_role_arn
  }

  tag {
    key                 = "asg:hostname_pattern"
    value               = "${var.hostname_prefix}-#instanceid.${var.vpc_name}.testing@${var.internal_zone_id}"
    propagate_at_launch = true
  }
}

module "autoscale_dns" {
  source = "meltwater/asg-dns-handler/aws"
  version = "x.y.z"
  
  autoscale_update_name     = "my_asg_handler"
  autoscale_route53zone_arn = var.internal_zone_id
  vpc_name                  = var.vpc_name
}

The initial_lifecycle_hook rules are where the magic happens. As instances are launched and terminated by the ASG, notifications are sent via SNS to a Lambda function which handles adding and removing Route53 DNS entries. The Lambda function then looks to the asg:hostname_pattern to set the desired DNS entry. Pay attention to the default_result value of ABANDON or CONTINUE, as this controls whether your instances are terminated, depending on whether the Lambda function succeeds or not.

You may wonder how we set the hostname of the instance itself. For that, we need to add a bit of scripting to user_data in the launch configuration:

#!/usr/bin/env bash
set -e
export AVAILABILITY_ZONE=$(curl -sLf http://169.254.169.254/latest/meta-data/placement/availability-zone)
export INSTANCE_ID=$(curl -sLf http://169.254.169.254/latest/meta-data/instance-id)
export NEW_HOSTNAME="${hostname_prefix}-$INSTANCE_ID"
hostname $NEW_HOSTNAME

The above script will work for instances running the Amazon Linux 2 operating system where the instance role allows the ec2:DescribeTags action.

Here is what you will see in the AWS console when setting hostname_prefix to i-am-unique for an ASG with three instances:

Auto Scaling Group:

EC2 Instances:

Route53 Zone:

Conclusions

Although we do appreciate the “Pets vs Cattle” analogy, we still see value in unique hostnames and DNS names. This module has helped us with naming Elasticsearch nodes, build agents for continuous integration, ECS clusters, and more.

We invite you to try our asg-dns-handler module and welcome your feedback and contributions in our open source project github.com/meltwater/terraform-aws-asg-dns-handler.

Image Credits:

Hello_my_name_is_sticker.svg