Building an HAQM ECS Anywhere home lab with HAQM VPC network connectivity

Since 2014, HAQM Elastic Container Service (HAQM ECS) has helped AWS customers orchestrate containerized application deployments across a wide range of different compute environments. Initially, HAQM ECS could only be used with AWS managed compute hardware, such as HAQM Elastic Compute Cloud (HAQM EC2) instances, AWS Fargate, AWS Wavelength, and AWS Outposts. With the general availability of HAQM ECS Anywhere (ECS Anywhere), it is now possible to use your own compute hardware as capacity for an HAQM ECS cluster.

This post will cover the process of building a home lab for running HAQM ECS tasks. The home lab will allow you to use the HAQM ECS API to launch tasks on your own compute hardware. Additionally, by using AWS Site-to-Site VPN, you can access a remote HAQM Virtual Private Cloud (HAQM VPC) from your local network or access your local network from the remote HAQM VPC. The Site-to-Site VPN allows local tasks running in a local HAQM ECS Anywhere cluster to talk to HAQM Relational Database Service (HAQM RDS), HAQM ElastiCache, or other fully managed AWS services inside the HAQM VPC. Furthermore, the local cluster can receive inbound connections from HAQM VPC hosted services like Application Load Balancer (ALB) or Network Load Balancer (NLB).

The architecture

To understand how ECS Anywhere works, we need to look at the components that make it function. Each piece of hardware or virtual machine that you want to use for ECS Anywhere requires a few components to function as part of the HAQM ECS cluster.

The first component is an agent that is connected to AWS Systems Manager. When you install the AWS Systems Manager Agent (SSM Agent), you supply a secret activation code that allows the agent to register itself with AWS Systems Manager. The agent uses the activation code to register the hardware device as a managed instance and download a secret key for that managed instance. From that point on, the managed instance can be assigned an AWS Identity and Access Management (IAM) role and will automatically receive IAM credentials for that role. This role is essential because it allows the instance to make all the other required communications to other AWS services like HAQM ECS.

The next essential component is Docker, which will launch containers on the managed host. One of the containers that Docker launches will be the HAQM ECS agent. This agent uses the managed instance’s IAM role to connect to the HAQM ECS control plane in an AWS Region. Once connected, it can receive instructions from the HAQM ECS control plane on what tasks and containers to launch. The agent can also submit task telemetry to the control plane about the lifecycle of those containers and their health.

The next piece to understand is how the networking operates between an HAQM VPC in an AWS Region and the local network that is running the ECS Anywhere cluster.

On the HAQM VPC side, we can use AWS Site-to-Site VPN to provide a fully managed VPN gateway. The gateway is configured to add a route in the route table for the HAQM VPC. The route directs all traffic that is addressed to the on-premises network CIDR range out via the VPN gateway. There is a corresponding self-managed VPN gateway on-premises, as well as self-managed routes so that any traffic addressed to the HAQM VPC CIDR range is directed to the on-premises end of the VPN gateway.

With this configuration, any resources in the on-premises network can talk to resources in the HAQM VPC using the private IP addresses of the HAQM VPC hosted resources. For instance, an on-premises Raspberry Pi can send traffic to an HAQM RDS instance running in the HAQM VPC. Additionally, HAQM VPC-hosted resources can talk to resources on-premises using their private IP addresses. In the previous diagram, an HAQM VPC-hosted NLB communicates with a Raspberry Pi using the private IP address of the Raspberry Pi.

It is important to remember that as long as on-premises devices have internet connectivity, they can communicate to many AWS services using the internet gateway alone. This includes HAQM ECS, HAQM DynamoDB, HAQM Simple Storage Service (S3), and many other AWS services that are globally accessible via public service endpoints. Extra networking configuration is only required for AWS services that are tied to a specific HAQM VPC.

Building your home lab hardware

ECS Anywhere is designed to function on a wide range of different devices and operating systems, so there are many different hardware options you can choose from to build your home lab. You may already have some hardware that you wish to use for your HAQM ECS cluster, but perhaps you were looking for a reason to buy some new stuff! This section contains a parts list to help you build an ECS Anywhere home lab using Raspberry Pi devices. These parts can be substituted with other alternatives for your home lab, but you may find this list to be a good starting point for your build.

Raspberry Pi has a few key benefits. The devices are fairly cheap, so you can build a larger cluster on a lower budget. Additionally they are ARM-based devices, which can make them perfect for testing ARM builds locally at home if your other development devices are all Intel-based. Finally, Raspberry Pi is a low-power device that can be run with passive cooling, so it can be ideal if you don’t want a lot of noisy fans in your office.

For compute, you might use the following components:

4 x Raspberry Pi 4 Model B (8 GB RAM, Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5 GHz). This provides a total of 16 cores and 32 GB of memory for running HAQM ECS tasks.
4 x Raspberry Pi Power over Ethernet HAT. This is an add-on circuit board that sits on top of the Raspberry Pi and gives it the ability to be powered over the Ethernet cable. This part is optional, but ideal if you don’t want to deal with power cables in addition to network cables.
4 x 128 GB SD card. This serves as persistence for the Raspberry Pi to store the operating system and everything you install and run on the Raspberry Pi, including Docker images.
4 x Raspberry Pi low profile heatsinks, for enhanced passive cooling. The heat sink must be small enough to fit in between the Raspberry Pi and the PoE Hat.

To get these individual devices running as a neat, self-contained cluster, consider the following components:

4 x Cat8 Ethernet patch cable, 1-foot length, to connect the Raspberry Pis to a switch.
1 x TP-Link 8 Port Power over Ethernet Switch. This supplies all of the Raspberry Pis with power and a wired internet connection so that you don’t overload your Wi-Fi network. This switch also fits almost perfectly in the case.
1 x cluster case for Raspberry Pi from C4 Labs. The clear case lets you easily see the devices inside of it. It has eight detachable bays and room for a switch on the bottom. The case comes with room for cooling fans which you can install if you plan to fill all eight slots with devices and therefore need some mechanical help to actively pull more air flow through the case.

Setting up the software

After assembling all the hardware, you will need to do a bit of software setup. Specifically for Raspberry Pi, you can use the Raspberry Pi imager to install an operating system on the SD cards. For this example cluster, we will use Ubuntu 20. Add your public key as an authorized user for SSH access. Then, use SSH to run commands on the Raspberry PI itself. You will need to make a few adjustments to the firmware configuration.

First, in order for your devices to run Docker and HAQM ECS tasks, you must enable memory cgroups in the boot config. This may not be enabled by default, but it is necessary for Docker to function properly when you set hard or soft memory limits in your HAQM ECS task definition. You can do this by adding cgroup_enable=memory to the file /boot/firmware/cmdline.txt.

Additionally, for Raspberry Pi, you may want to reduce the noise from the cluster. The stock power over Ethernet hats have cooling fans that try to keep the temperature much lower than strictly necessary. If you have installed heatsinks, they will likely keep the device passively cooled well below its maximum operating temperature under typical conditions. The following configuration in /boot/firmware/usercfg.txt can keep the fans from turning on until the device reaches 68C.

dtoverlay=rpi-poe
dtparam=poe_fan_temp0=68000
dtparam=poe_fan_temp1=72000
dtparam=poe_fan_temp2=76000
dtparam=poe_fan_temp3=80000

By running cat /sys/class/thermal/thermal_zone0/temp, you can monitor the Raspberry Pi temperature and ensure that it remains reasonable under load. As long as the ambient temperature is not too high, the Raspberry Pi can be passively cooled by the heatsink until it’s been under a heavy load for an extended period of time. Once the temperature exceeds 68 degrees, the fan comes on for active cooling.

With these initial tweaks out of the way, you can use the AWS Management Console to get an activation command to run on each of your devices. This command will turn the device into capacity for your HAQM ECS cluster.

The script automatically installs and configures the SSM Agent, Docker, and the HAQM ECS agent without any further input necessary. Once the script has finished running, you can see the devices show up under AWS Systems Manager Fleet Manager. You’ll also see a few details, such as their local IP address within your home network.

One of the useful features of Fleet Manager is the ability to connect to a managed instance. This even works for devices that are behind Network Address Translation, with only a private IP address. This is because the SSM agent on the host opens a control channel back to SSM. This can be used to both monitor the managed instance, as well as open an AWS SSM Session Manager session to it. When you select “Start session,” it opens a shell right there in the browser. By launching htop you can see the process tree, with SSM agent spawning a session worker that runs the shell.

Setting up the AWS Site-to-Site VPN

There are a few different networking approaches that you can use for your HAQM ECS cluster. The simplest approach for inbound traffic would be to configure port forwarding on your home router. This lets you send traffic to your home IP address and have it forwarded to one of the devices on your network.

But what if you want to connect back to resources inside an HAQM VPC? For large on-premises environments, you could use AWS Direct Connect to get a direct connection to AWS. However, for a home lab, this is not ideal. As a reduced-cost alternative, you may use an AWS Site-to-Site VPN. One of the Raspberry Pis can serve as a dedicated VPN gateway that runs strongSwan as an IPsec VPN. By going to the HAQM VPC console, you can create the AWS-side VPN gateway and download the instructions for configuring the on-premises VPN gateway.

By following the downloaded instructions, you can set up an IPsec VPN tunnel between your home network and your HAQM VPC. Run ipsec status to verify that the tunnels are up. In this case, you can see the output for a VPN connection that has been configured between a home network at 192.168.1.0/24 and an HAQM VPC at 10.0.0.0/16.

Next, you need to configure a VPN Gateway route on the other Raspberry Pi devices. This route will tell them that any traffic addressed to the IP range of the HAQM VPC should use the local IP address of the VPN Raspberry Pi as a gateway to reach the HAQM VPC. In the following example, the VPN Raspberry Pi has a local IP address of 192.168.1.196.

sudo route add -net 10.0.0.0/16 gw 192.168.1.196

We can verify network connectivity by using Fleet Manager again. Open a session to one of the Raspberry Pi’s, then ping an HAQM EC2 instance running inside the HAQM VPC.

In the previous screenshot, you can see the results of running a ping command on a Raspberry Pi that is running on my desk in my home network in New York City. The address that is being pinged is an HAQM EC2 instance running inside a VPC in US East (N. Virginia). The ping makes the roundtrip from New York City to US East, and back, in less than 11ms. Your results may vary based on your internet connection, and distance from the AWS Region where you provisioned your HAQM VPC.

Launching a load balanced workload in the home cluster

With all the hardware and software setup prerequisites out of the way, you can launch a test workload in the cluster and verify that this all works. You can launch an HAQM ECS service into your home lab cluster using the new EXTERNAL launch type. Both your task definition and your HAQM ECS service must be created with the EXTERNAL launch type.

The following example shows a service called redis. Redis is a stateful service that relies on persisting information to disk. Stateful services are tricky because they have to run in the same place that their data has been stored. With ECS Anywhere you can solve this using task placement constraints. Task placement constraints can be used to pin workloads to a specific device. In this case, the redis task has been pinned to a specific Raspberry Pi using an HAQM ECS instance attribute redis=true and the task placement constraint: memberOf attribute:redis=true

Once your tasks launch, you can get metrics and logs for them as if they were running on an HAQM EC2 instance inside your VPC.

If you are hosting a service that needs to receive incoming traffic from the internet, then you will likely want a load balancer. One benefit of hosting that load balancer in an AWS Region is that you don’t have to configure DNS to point at the address of your home network. Instead the load balancer can serve your traffic using its own IP address, and your home network will be protected behind the VPN connection.

If you choose this configuration then you must launch the NLB or ALB into the same VPC that you configured in your AWS Site-to-Site VPN. If the load balancer is inside that VPC, it can send traffic to the private IP addresses of your devices inside your home network via the VPN gateway. Currently, you need to add the private IP address and port combinations of your home lab devices to the load balancer manually. Fortunately, your devices on a home network will likely have static IP addresses, so this configuration should be stable.

Once the load balancer has registered each target as healthy, you can send traffic to the load balancer’s DNS name. In this case, the response is a simple HTML page served by a small Node.js app, with a hit counter that is persisted into the Redis.

Conclusion

With ECS Anywhere, you can orchestrate container experiments in your own home lab from the cloud. You don’t have to run the control plane on your own devices. Instead, you can use your devices purely as capacity for your applications. With ECS Anywhere, you can define the desired state of the software on your devices and leave the distribution of tasks to hosts to be handled by the HAQM ECS control plane. HAQM ECS monitors your tasks and restarts them if necessary. Additionally, with Fleet Manager, you get the added ability to connect to and control your managed devices from anywhere that you have an internet connection, even if your devices are behind NAT, inside your private network.

There’s a lot more to ECS Anywhere, and we encourage you to check out the documentation and the launch blog. You can also join our live stream on Containers from the Couch in June, where we will walk through ECS Anywhere and take your questions.

Containers