AWS HPC Blog

Adding functionality to your applications using multiple containers in AWS Batch

Adding functionality to your applications using multiple containers in AWS BatchPackaging an application and its dependencies into a container has multiple benefits, including portability and deployment consistency. Sometimes, however, you need to use multiple applications working together to complete a single task, which raises the question of how to package separate applications into a single container, or follow development best practices and encapsulate each application into a separate container. While a single container is simpler to use, it can cause large container sizes and increase the container management overhead when one application changes but the other does not. Using multiple containers means that you’ll need to define how they share resources such as data or service endpoints with each other.

The answer to this depends on where you are in your development lifecycle. Both approaches are valid given the proper context. I personally package applications into a single container when I’m in early stages of developing and testing a method, then switch to packaging applications as separate containers once I begin the process of creating a production deployment.

This post is about that latter part: how to coordinate applications in separate containers within a single job definition in AWS Batch. In the multi-container feature announcement blog post we gave several motivating examples in robotics where separate containers worked together to perform complex simulations.

Today, I’m going to show you a simpler example: how an application in one container can access data or another application in a separate container. Specifically I will use the AWS CLI that resides in the other container. Let’s get started.

A primer on dependencies between multiple containers

Before I can show you how to leverage multi-container job definitions, I first need to cover some basics about how multi-container job definitions define the relationship and role of each container to others in the same job. This post is targeting jobs that run on HAQM Elastic Container Service (HAQM ECS) with HAQM Elastic Compute Cloud (HAQM EC2) launch type. I’ll also briefly cover HAQM ECS with Fargate and HAQM Elastic Kubernetes Service (HAQM EKS) towards the end of the post.

ECS container dependency properties overview

There are three properties of an AWS Batch ECS job definition that determine the roles and relationships between each container in the job—whether a container is “essential”, their defined dependencies, and shared storage volumes. I’ll describe the task container properties next.

Essential containers

The first property to consider is whether a container is essential – meaning the container is required to run the full length of the job. If an essential container exits for any reason, all other containers that are part of the job are stopped and the job exits. The overall status of the Batch job is determined by this container’s exit code. By default, a container is essential.

If the container is non-essential ("essential": false), other containers are not affected when this container stops running.

This is worth repeating: the default value of essential is true, meaning any container in the task is considered essential unless you specifically mark it nonessential! If one of your containers is responsible for setting up some aspect of the job and then stops running, be sure that you set it as non-essential or the important part of your job will never run.

Container dependencies

You define dependencies between containers using the dependsOn property.

A dependency is composed of two items:

  1. containerName that defines which container that you are defining a dependency upon
  2. condition that the referenced container needs to fulfill before this container starts.

The possible values for condition are:

  1. START – the referenced container is started (in the RUNNING state) before this one can start
  2. COMPLETE – the referenced container ran to completion and exited before this container can start
  3. SUCCESS – similar to COMPLETE, but the referenced container must exit with an exit code status of zero (0).

You cannot set the dependency condition to be COMPLETE or SUCCESS on an essential container since, by definition, when an essential container exits all other running containers in the job are shut down and the job exits. If that happens then your dependent container may not even get a chance to start.

Shared storage volumes

Storage volumes are used to share data between containers, and can either be managed completely within the container runtime (i.e. Docker) or be mounted onto a host-level directory. In AWS Batch, we define shared volumes at the job level, while each container defines a local mountpoint to leverage the shared volume(s).

With that background information, let’s finally look at implementing our example.

Accessing an application from another container

Now that we have the basics of multi-container job definitions using ecsProperties, let’s take a look at how to put these concepts to work together to allow one containerized application to access another within the same job definition. In particular, I want my main application to make system calls with the AWS CLI, but I don’t want to package the AWS CLI into the same container.

To accomplish this, we need two containers: one to provide the AWS CLI binaries, and the other to make calls to the AWS CLI. For the example, I’ll list my HAQM Simple Storage Service (HAQM S3) buckets to the container’s log stream.

I’ll use the official AWS CLI Docker image from the HAQM ECR Public Gallery which (as we know from the Dockerfile) stores the libraries and executables in the /usr/local/aws-cli/ directory. That means we just need to cross-mount that directory to our application container. Specific to the AWS Batch job definition:

  1. Define a task volume for sharing data across containers.
  2. Mark the AWS CLI container as non-essential since it will not be running any processes for the duration of the job.
  3. Define a dependency from the application container to the AWS CLI container. Since the AWS CLI container will not be running processes, use the COMPLETE dependency condition.
  4. For both containers define a container mountpoint to /usr/local/aws-cli.

Figure 1 illustrates these properties using a job definition diagram of the volume mounts, container dependency, and command to list out the HAQM S3 buckets.

Figure 1: A diagram representation of an AWS Batch job definition showing the structure of the ecsProperties request parameter. The diagram shows a task-level job volume and two containers that reference the volume to share data between them.

Figure 1: A diagram representation of an AWS Batch job definition showing the structure of the ecsProperties request parameter. The diagram shows a task-level job volume and two containers that reference the volume to share data between them.

When you run this job, you will see two log streams, one for each container (Figure 2). The AWS CLI container exited with return code 252 to reflect that was not provided any arguments, and it’s log stream will contain the default usage dialog. The log stream for the application container should list the HAQM S3 buckets in your account, provided the role your job ran and has that permission.

Figure 2: The AWS Batch console view of the completed job, showing links for the HAQM CloudWatch log streams for each of the two containers.

Figure 2: The AWS Batch console view of the completed job, showing links for the HAQM CloudWatch log streams for each of the two containers.

Sharing data across multiple containers within Fargate job definitions

There’s a really important caveat about this example: Docker volumes are only supported in the EC2 launch type. This mechanism won’t work for ECS Fargate job definitions. If you want to share data or an executable from one container to another in AWS Batch with Fargate, you will need to export the data from the source container using the VOLUME directive in the Dockerfile.

For our example, this means that we must create a new container that explicitly exposes the /usr/local/aws-cli/v2/current directory as a VOLUME. You then need to use this custom container within the Fargate job definition. Here’s an example Dockerfile that pulls the latest AWS CLI container image and adds the volume definition:

FROM public.ecr.aws/aws-cli/aws-cli:latest
VOLUME /usr/local/aws-cli/v2/current

For more examples and information, refer to the ECS documentation for considerations when using bind mounts.

What about Kubernetes?

Kubernetes Pods do not have rich container dependencies like those in ECS. A Pod is able to specify init containers which run before all application containers, but these must exit successfully before the application containers can start. This works for AWS Batch jobs running on HAQM EKS that need to share either data or binaries to other containers, i.e. the AWS CLI example, since all we need is a shared volume. If your application container depends on a service running in another Pod container, like sidecar for monitoring, you will need to verify the service is active within your application container.

Conclusion

In this blog post we took a deep dive on the necessary parameters for defining AWS Batch jobs with multiple containers for ECS using EC2. I showed how you can leverage multi-container job definitions to access executables or data from one container in another in the same job definition. In the example we showed how you can access the AWS CLI from another container to list your S3 buckets. You can extend this example to be able to add other functionality or data to your applications using multiple containers.

To get started, read the documentation on Creating job definitions using EcsProperties. Let us know what you think by sending a message to ask-hpc@haqm.com.

Angel Pizarro

Angel Pizarro

Angel is a Principal Developer Advocate for HPC and scientific computing. His background is in bioinformatics application development and building system architectures for scalable computing in genomics and other high throughput life science domains.