Simplifying HAQM ECS monitoring set up with AWS Distro for OpenTelemetry

In this blog post, we explain how to set up AWS Distro for OpenTelemetry for HAQM Elastic Container Service (HAQM ECS) with the 1-click console integration. OpenTelemetry metrics and traces collection on HAQM ECS mainly consists in instrumenting your application and running the AWS Distro for OpenTelemetry Collector container, using its HAQM Elastic Container Registry (HAQM ECR) public gallery image, as a sidecar with multiple configuration options.

To make monitoring metrics and traces for HAQM ECS applications easier, we now provide guided AWS Distro for OpenTelemetry configurations through the HAQM ECS console. This allows you to get started quickly with traces and metrics (preview) visibility for your applications without needing an extensive knowledge of AWS Distro for OpenTelemetry and HAQM ECS configurations. Advanced configurations are still possible by modifying HAQM ECS tasks definitions and AWS Distro for OpenTelemetry configurations.

Walkthrough

Using the new console experience, we will configure an application that exposes Prometheus metrics and OpenTelemetry traces data. The application will run in an HAQM ECS cluster with AWS Fargate, and the metrics will be stored in HAQM Managed Service for Prometheus workspace and traces in AWS X-Ray. We will then visualize both metrics and traces with HAQM Managed Grafana.

architecture

Prerequisites

As mentioned previously, we will use a sample application that uses the OpenTelemetry SDK for instrumentation. The application’s container image has been already built and is available on HAQM ECR public gallery. The latest version of the source code is available on GitHub.

Before running the application, we must create resources:

HAQM Managed Service for Prometheus workspace to store metrics from the AWS Distro for OpenTelemetry collector
Application Load Balancer and target group to distribute traffic to the application
AWS Identity and Access Management (IAM) task role to provide HAQM ECS tasks with write permissions to AWS X-Ray and HAQM Managed Service for Prometheus. You can attach the following managed IAM policies to the role: AWSXrayWriteOnlyAccess, HAQMECSTaskExecutionRolePolicy, and HAQMPrometheusRemoteWriteAccess.

permissions

HAQM ECS task definition setup (new experience)

HAQM ECS task definitions are essential for running containers in HAQM ECS. A task definition is a text file in JSON format that describes one or more containers, which form your application. With the new console experience, we provide a guided experience to configure an application and its monitoring options. On the new HAQM ECS task definition console, we perform the following steps to enable metrics and traces collection with OpenTelemetry:

Select Create a new task definition and set the Task definition family name: sampleapp.
Configure main application in Container – 1.
1. Set the container name: sampleapp.
2. Set the image URI: public.ecr.aws/one-observability-workshop/demo-sampleapp:latest.
3. Set the container port to 8080 and leave protocol on TCP.
Select Next to configure the environment.
1. Select AWS Fargate (serverless) for the environment.
2. Select task CPU .5 vCPU and memory 1 GB.
3. Select the IAM role previously created.
Configure monitoring and logging.
1. Choose Log collection and select HAQM CloudWatch and log groups values to default. Alternatively, via AWS FireLens you can export logs to: HAQM Kinesis Data Firehose, HAQM Kinesis Data Streams, HAQM OpenSearch Service, or HAQM Simple Storage Service (HAQM S3).
2. Choose Use trace collection. We will configure an AWS Distro for OpenTelemetry container to route traces from your application to AWS X-Ray
3. Choose Use metric collection and select HAQM Managed Service for Prometheus (OpenTelemetry instrumentation) as the destination. Alternatively, you can select HAQM Managed Prometheus (Prometheus libraries instrumentation) or HAQM CloudWatch Container Insights. After selecting HAQM Managed Service for Prometheus as the destination, we will configure an AWS Distro for OpenTelemetry sidecar to route metrics to an HAQM Managed Service for Prometheus workspace. In this example, we only will have one sidecar for both metrics and traces.
4. Enter the HAQM Managed Service for Prometheus workspace (remote write endpoint) that you can retrieve on the HAQM Managed Service for Prometheus console.

screenshot of monitoring and logging - optional

Select Next to review—and optionally edit—the settings and create the task definition. After its creation, two containers are shown in the task definitions: sampleapp and aws-otel-collector. Also, you can view, download, and reuse the rendered JSON version of the task definition. For this example, you should get output similar to the following:

{
    "taskDefinitionArn": "arn:aws:ecs:eu-central-1:12345678910:task-definition/sampleapp:2",
    "containerDefinitions": [
        {
            "name": "sampleapp",
            "image": "public.ecr.aws/one-observability-workshop/demo-sampleapp:latest",
            "cpu": 0,
            "links": [],
            "portMappings": [
                {
                    "containerPort": 8080,
                    "hostPort": 8080,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "entryPoint": [],
            "command": [],
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "secrets": [],
            "dnsServers": [],
            "dnsSearchDomains": [],
            "extraHosts": [],
            "dockerSecurityOptions": [],
            "dockerLabels": {},
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/sampleapp",
                    "awslogs-region": "eu-central-1",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []
            },
            "systemControls": []
        },
        {
            "name": "aws-otel-collector",
            "image": "public.ecr.aws/aws-observability/aws-otel-collector:v0.14.1",
            "cpu": 0,
            "links": [],
            "portMappings": [],
            "essential": true,
            "entryPoint": [],
            "command": [
                "--config=/etc/ecs/ecs-amp-xray.yaml"
            ],
            "environment": [
                {
                    "name": "AWS_PROMETHEUS_ENDPOINT",
                    "value": "http://aps-workspaces.eu-central-1.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/remote_write"
                }
            ],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "secrets": [],
            "dnsServers": [],
            "dnsSearchDomains": [],
            "extraHosts": [],
            "dockerSecurityOptions": [],
            "dockerLabels": {},
            "ulimits": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "True",
                    "awslogs-group": "/ecs/ecs-aws-otel-sidecar-collector",
                    "awslogs-region": "us-west-2",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []
            },
            "systemControls": []
        }
    ],
    "family": "sampleapp",
    "executionRoleArn": "arn:aws:iam::12345678910:role/ECS-Console-V2-TaskDefinition-ECSTaskExecutionRole-1WIFSJU2DO7K5",
    "networkMode": "awsvpc",
    "revision": 2,
    "volumes": [],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.17"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "ecs.capability.task-eni"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2",
        "FARGATE"
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "512",
    "memory": "1024",
    "registeredAt": "2021-10-28T20:19:37.226Z",
    "registeredBy": "arn:aws:iam::12345678910:user/user",
    "tags": [
        {
            "key": "ecs:taskDefinition:createdFrom",
            "value": "ecs-console-v2"
        }
    ]
}

HAQM ECS setup

Running AWS Distro for OpenTelemetry collector on HAQM ECS doesn’t require an HAQM ECS service. However, in addition to maintaining the desired number of running tasks, it allows you to run an application behind a load balancer and distribute traffic across the tasks associated with the service.

At this stage, we are almost ready to create the service and run the entire setup. To create the service, follow the documentation.

Select the load balancer and HAQM ECS task definition created previously. The application should be running, as shown in the following:

application is shown running in a screenshot

Test the application and by opening the HAQM Elastic Compute Cloud (HAQM EC2) console, navigating to the load balancer details, and getting its associated DNS name. Run the following script to generate traffic load:

while true
do 
    curl -v http://<ALB_DNS_NAME>/outgoing-http-call
    sleep 2
done

Metrics and traces visualization with HAQM Managed Grafana

Now the application is running, with the AWS Distro for OpenTelemetry collector alongside, sending metrics to HAQM Managed Service for Prometheus and traces data to AWS X-Ray. Open the X-Ray console to be shown the service map that represents the application’s behavior. For every HTTP call, the application issues a call to aws.haqm.com as shown in the following:

application issuing a call to aws.haqm.com

Alternatively, you can set up an HAQM Managed Grafana workspace and visualize both X-Ray traces and Prometheus metrics.

sample app visualization in HAQM Managed Grafana

Conclusion

HAQM ECS allows you to run applications with multiple options for observability scenarios depending on your use cases.

In this blog post, we have explained how AWS Distro for OpenTelemetry integration with the HAQM ECS console can permit metrics and traces collection set up without requiring a dive into AWS Distro for OpenTelemetry configurations.

Refer to the documentation for possible scenarios with the AWS Distro for OpenTelemetry and HAQM ECS integration. Recently we announced the general availability on tracing with AWS Distro for OpenTelemetry, and we are working with the upstream community to make metrics a stable feature of OpenTelemetry.

AWS Open Source Blog