Analyzing Java applications performance with async-profiler in HAQM EKS

This blog was authored by Sascha Möllering, Principal Specialist Solutions Architect Containers and Yuriy Bezsonov, Senior Partner Solutions Architect.

Introduction

Container startup performance presents a significant challenge for Java applications running on Kubernetes, particularly during scaling events and recovery scenarios. Although Java remains a top choice for both legacy modernization and new microservices development, especially with frameworks such as Spring Boot or Quarkus, containerized Java applications often face unique challenges around startup times and runtime behavior.

Performance profiling in containerized Java applications has long presented significant challenges. Async-profiler, a lightweight sampling solution from the HAQM Corretto team, offers an interesting approach for Java workloads running on HAQM Elastic Kubernetes Service (HAQM EKS). Eliminating traditional Safepoint bias issues (more information about Java Safepoint can be found in this post) enables more accurate performance analysis.

In this post we explore practical implementations for both on-demand and continuous profiling scenarios, using the Mountpoint for HAQM S3 Container Storage Interface (CSI) driver to efficiently manage profiling data in your Kubernetes environment.

Solution overview

To avoid depending upon the JVM reaching a safepoint before profiling, async-profiler features HotSpot-specific APIs to collect stack traces. It works with OpenJDK and other Java runtimes based on the HotSpot JVM.

async-profiler can trace the following kinds of events:

CPU cycles.
Hardware and Software performance counters such as cache misses, branch misses, page faults, context switches, etc.
Allocations in Java Heap.
Contented lock attempts, such as both Java object monitors and ReentrantLocks.
Wall-clock time (also called wall time), which is the time it takes to run a block of code.

We use UnicornStore as an example of a Java application to be profiled, as shown in the following figure. The UnicornStore is a Spring Boot 3 Java Application that provides RESTful API. It stores data in a relational database running on HAQM Aurora Serverless with HAQM Relational Database Service (HAQM RDS) for Postgres engine and afterward publishes an event about performed actions to HAQM EventBridge .

We use HAQM EKS Auto Mode to deploy a ready-to-use Kubernetes cluster.

Figure 1: UnicornStore application architecture

Prerequisites

The solution is based on the infrastructure as a code (IaC) of “Java on AWS Immersion Day”, which streamlines the setup of the environment. You only need an AWS account and AWS CloudShell to bootstrap the environment.

Walkthrough

The following steps walk you through this solution.

Setting up the environment

You can use the following setup to create a solution infrastructure with Visual Studio Code for the Web and with all the necessary tools installed:

Navigate to CloudShell in the AWS console.
Deploy AWS CloudFormation. You can also deploy the template directly in the CloudFormation console using the file from the provided link.

curl http://raw.githubusercontent.com/aws-samples/java-on-aws/main/infrastructure/cfn/unicornstore-stack.yaml > unicornstore-stack.yaml
CFN_S3=cfn-$(uuidgen | tr -d - | tr '[:upper:]' '[:lower:]')
aws s3 mb s3://$CFN_S3
aws cloudformation deploy --stack-name unicornstore-stack \
    --template-file ./unicornstore-stack.yaml \
    --s3-bucket $CFN_S3 \
    --capabilities CAPABILITY_NAMED_IAM
aws cloudformation describe-stacks --stack-name unicornstore-stack --query "Stacks[0].Outputs[?OutputKey=='IdeUrl'].OutputValue" --output text
aws cloudformation describe-stacks --stack-name unicornstore-stack --query "Stacks[0].Outputs[?OutputKey=='IdePassword'].OutputValue" --output text

Wait until the command finishes successfully. The deployment takes about 15-20 minutes.

After successful creation of the CloudFormation stacks, you can access the VS Code instance using the IdeUrl and the IdePassword from the output of the preceding command.

All following commands must be run in the Terminal window of the VS Code instance.

Instrumenting a container image with a profiler

In this post you use wall-clock profiling. For functions specifically, wall-clock time measures the total duration from when the function starts until it completes. This measurement encompasses all delays, such as time spent waiting for locks to release and threads to synchronize. When wall-clock time exceeds CPU time, it suggests your code is spending time in a waiting state. A significant gap between these times often points to potential resource bottlenecks in your application.

Conversely, when CPU time closely matches wall-clock time, it indicates computationally heavy code, where the processor is actively working for most of the running period. These CPU-bound code segments that take considerable time to run may benefit from performance optimization efforts.

Add profiler binaries to a container image. Use multi-stage build to build container images:

cat <<'EOF' > ~/environment/unicorn-store-spring/Dockerfile
FROM public.ecr.aws/docker/library/maven:3-amazoncorretto-21-al2023 AS builder

RUN yum install -y wget tar gzip
RUN cd /tmp && \
    wget http://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz && \
    mkdir /async-profiler && \
    tar -xvzf ./async-profiler-3.0-linux-x64.tar.gz -C /async-profiler --strip-components=1

COPY ./pom.xml ./pom.xml
COPY src ./src/

RUN mvn clean package && mv target/store-spring-1.0.0-exec.jar store-spring.jar
RUN rm -rf ~/.m2/repository

FROM public.ecr.aws/docker/library/amazoncorretto:21-al2023
RUN yum install -y shadow-utils procps tar

COPY --from=builder /async-profiler/ /async-profiler/
COPY --from=builder store-spring.jar store-spring.jar

RUN groupadd --system spring -g 1000
RUN adduser spring -u 1000 -g 1000
ENV SPRING_THREADS_VIRTUAL_ENABLED=false
USER 1000:1000

EXPOSE 8080
ENTRYPOINT ["java","-jar","-Dserver.port=8080","/store-spring.jar"]
EOF

Build and push a new container image to the HAQM Elastic Container Registry (HAQM ECR):

~/java-on-aws/infrastructure/scripts/deploy/containerize.sh

Deploy the Java application to the EKS cluster:

~/java-on-aws/infrastructure/scripts/deploy/eks.sh
kubectl get pods -n unicorn-store-spring
POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}')
echo $POD_NAME

The deployment takes about 3-5 minutes.

On-demand profiling

Now you can start on-demand profiling and benchmark the Java application under load.

1. Start on-demand profiling in the container and get the status (line 1), create load with Artillery for one minute with 200 concurrent POST request of createUnicorn (line 3), create a folder for profiling results, and then stop the profiling when the benchmarking is finished (line 4):

kubectl exec -it $POD_NAME -n unicorn-store-spring -- /bin/bash -c "/async-profiler/bin/asprof start -e wall jps && /async-profiler/bin/asprof status jps"
SVC_URL=$(~/java-on-aws/infrastructure/scripts/test/getsvcurl.sh eks) && echo $SVC_URL
~/java-on-aws/infrastructure/scripts/test/benchmark.sh $SVC_URL 60 200
kubectl exec -it $POD_NAME -n unicorn-store-spring -- /bin/bash -c "mkdir -p /home/spring/profiling && /async-profiler/bin/asprof stop -f /home/spring/profiling/profile-%t.html jps"

2. The resulting file is stored in the container /home/spring/profiling/ folder. Copy the resulting file to the development instance. Save the Summary report output from benchmarking tool for further comparison.

kubectl -n unicorn-store-spring cp $POD_NAME:/home/spring/profiling ~/environment/unicorn-store-spring

You can download the resulting file to your computer using right-click on the file name. Choose Download ... and open the file in a browser.

Figure 2: Download on-demand profiling results

Analyzing the profiling results and performing optimization

As the result you get a Flame Graph.

1. Choose the Search button in the top left corner and search for the UnicornController.createUnicorn method.

A significant portion of the run time is waiting for a database connection. You can improve this by increasing the number of database connections in the Java application properties file, as shown in the following figure.

Figure 3: Initial on-demand profiling results

2. Change the number of database connections from 1 to 10:

sed -i 's/spring\.datasource\.hikari\.maximumPoolSize=[0-9]*/spring.datasource.hikari.maximumPoolSize=10/' \
  ~/environment/unicorn-store-spring/src/main/resources/application.properties

Assuming that you have addressed what causes waiting time in the requests by changing the datasource value to 10, you can benchmark it with profiling and compare the results.

3. Build and push a new version container image to the HAQM ECR container registry:

~/java-on-aws/infrastructure/scripts/deploy/containerize.sh

4. Redeploy the application to the EKS cluster:

kubectl rollout restart deployment unicorn-store-spring -n unicorn-store-spring
kubectl rollout status deployment unicorn-store-spring -n unicorn-store-spring
sleep 15
kubectl get pods -n unicorn-store-spring
POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}') 

echo $POD_NAME

5. Repeat the benchmark test with profiling using the same command or with the following script, and download the results:

~/java-on-aws/infrastructure/scripts/test/profiling.sh

kubectl -n unicorn-store-spring cp $POD_NAME:/home/spring/profiling ~/environment/unicorn-store-spring

6. Open the downloaded file in a browser and search for UnicornController.createUnicornas shown in the following figure.

Figure 4: Improved on-demand profiling results

You can see that the waiting time for getting a connection is significantly decreased. The benchmarks for response time were improved, too, as shown in the following figure.

Figure 5: On-demand results comparison

Setting up the infrastructure to store results of continuous profiling

On-demand profiling helps find problems with the application, but those issues can happen at any time. It is a great advantage to be able to go back in time and investigate the state of a Java application when the issue happened. To achieve that, you can setup continuous profiling, create trace files, and store them to HAQM S3.

1. Create an S3 bucket to store the results of continuous profiling:

export S3PROFILING=profiling-data-$(uuidgen | tr -d - | tr '[:upper:]' '[:lower:]')
aws s3 mb s3://$S3PROFILING

2. Create an AWS Identity and Access Management (IAM) policy to allow pods with the Java application to access the S3 bucket:

cat <<EOF > service-account-s3-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::$S3PROFILING",
                "arn:aws:s3:::$S3PROFILING/*"
            ]
        }
    ]
}
EOF
aws iam create-policy --policy-name unicorn-eks-service-account-s3-policy --policy-document file://service-account-s3-policy.json
rm service-account-s3-policy.json

For the access to the S3 bucket, use the Mountpoint for HAQM S3 Container Storage Interface (CSI) driver, which is deployed as an HAQM EKS add-on. This driver allows the Java application running in the EKS cluster to put files to HAQM S3 through a file system interface. Built on Mountpoint for HAQM S3, the Mountpoint CSI driver presents an S3 bucket as a storage volume accessible by containers in a Kubernetes cluster.

At the time of writing, there is no solution like the HAQM EBS CSI driver for HAQM Elastic Container Service (HAQM ECS). However, the documentation describes how a similar approach can be implemented for an ECS cluster with HAQM Elastic Compute Cloud (HAQM EC2) instances.

3. Associate IAM OIDC provider with the EKS cluster and create an IAM role for the add-on:

eksctl utils associate-iam-oidc-provider --region=$AWS_REGION \
   --cluster=unicorn-store --approve
eksctl create iamserviceaccount --cluster unicorn-store \
   --name s3-csi-driver-sa --namespace kube-system \
   --attach-policy-arn=$(aws iam list-policies --query 'Policies[?PolicyName==`unicorn-eks-service-account-s3-policy`].Arn' --output text) \
   --approve --region=$AWS_REGION \
   --role-name unicorn-eks-s3-csi-driver-role --role-only

We use IAM roles for service accounts (IRSA) and not EKS Pod Identities due to the current limitations of Mountpoint for HAQM S3 CSI driver. Refer to the official installation guide for the updated installation procedure.

4. Install the Mountpoint for HAQM S3 CSI driver as the add-on to the EKS cluster:

eksctl create addon --name aws-mountpoint-s3-csi-driver --cluster unicorn-store \
    --service-account-role-arn arn:aws:iam::$ACCOUNT_ID:role/unicorn-eks-s3-csi-driver-role --force

5. Create PersistentVolume and PersistentVolumeClaim to access the S3 bucket from pods:

~/java-on-aws/infrastructure/scripts/deploy/s3pv.sh

6. Deploy the manifests for persistent objects to the EKS cluster:

kubectl apply -f ~/environment/unicorn-store-spring/k8s/persistence.yaml

Instrumenting a container image and deployment for continuous profiling

1. Override a command and arguments from the Dockerfile in the deployment and start profiling using the launching as agent instruction. With this approach you can change profiling parameters without a need to rebuild a container image. async-profiler starts with the Java application and creates call stacks each minute. Add commands to deployment.yaml:

code ~/environment/unicorn-store-spring/k8s/deployment.yaml

~/environment/unicorn-store-spring/k8s/deployment.yaml
apiVersion: apps/v1
…
      containers:
        - name: unicorn-store-spring
          command: ["/bin/sh", "-c"] args: - mkdir -p /profiling/$HOSTNAME && cd /profiling/$HOSTNAME; java -agentpath:/async-profiler/lib/libasyncProfiler.so=start,event=wall,file=./profile-%t.txt,loop=1m,collapsed -jar -Dserver.port=8080 /store-spring.jar;
          …
          securityContext:
            runAsNonRoot: true
            allowPrivilegeEscalation: false
          volumeMounts: - name: persistent-storage mountPath: /profiling volumes: - name: persistent-storage persistentVolumeClaim: claimName: s3-profiling-pvc
EOF

2. Deploy the manifests to the EKS cluster and restart the deployment:

kubectl apply -f ~/environment/unicorn-store-spring/k8s/deployment.yaml
kubectl rollout status deployment unicorn-store-spring -n unicorn-store-spring
sleep 15
kubectl get pods -n unicorn-store-spring

3. Check the state of the Java application pod and profiler:

POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}')
echo $POD_NAME
kubectl logs $POD_NAME -n unicorn-store-spring | grep "Profiling started"  4. The output of the command should be similar to the following output:

unicorn-store-spring-866847c8d8-rx82w
Profiling started

Analyzing the results continuous profiling

1. Create a load for five minutes:

SVC_URL=$(~/java-on-aws/infrastructure/scripts/test/getsvcurl.sh eks)

echo $SVC_URL
~/java-on-aws/infrastructure/scripts/test/benchmark.sh $SVC_URL 300 200

Each minute the profiler creates profile-YYYYMMDD-HHMISS.txt in the S3 bucket path corresponding to a pod name, as shown in the following figure.

Figure 6: Continuous profiling results on HAQM S3

You can convert any of those files to Flame Graph using converter.jar from async-provider.

2. Create a folder for profiling stacks and copy the files from the S3 bucket:

POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}')
mkdir -p ~/environment/unicorn-store-spring/stacks/$POD_NAME
aws s3 cp s3://$S3PROFILING/$POD_NAME ~/environment/unicorn-store-spring/stacks/$POD_NAME/ --recursive

3. Download async-provider to the development instance

cd ~/environment/unicorn-store-spring
wget http://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz
mkdir ~/environment/unicorn-store-spring/async-profiler
tar -xvzf ./async-profiler-3.0-linux-x64.tar.gz -C ~/environment/unicorn-store-spring/async-profiler --strip-components=1
rm ./async-profiler-3.0-linux-x64.tar.gz

4. Choose one of the files and convert to FlameGraph, for example, the first file:

cd ~/environment/unicorn-store-spring
STACK_FILE=$(find ~/environment/unicorn-store-spring/stacks/$POD_NAME -type f -printf '%T+ %p\n' | sort | head -n 1| cut -d' ' -f2-)
java -cp ./async-profiler/lib/converter.jar FlameGraph $STACK_FILE ./profile.html

5. Download profile.html to your computer, open it with a browser and Search for UnicornController.createUnicorn, as shown in the following figure.

Figure 7: Continuous profiling graph

This approach allows you to analyze the state of a Java application during specific period in time, such as the startup phase.

Cleaning up

1. To avoid incurring future charges, delete deployed AWS resources with the commands in the VS Code terminal:

~/java-on-aws/infrastructure/scripts/cleanup/eks.sh
aws cloudformation delete-stack --stack-name eksctl-unicorn-store-addon-iamserviceaccount-kube-system-s3-csi-driver-sa
aws s3 rm s3://$S3PROFILING --recursive
aws s3 rb s3://$S3PROFILING

2. Close the tab with VS Code, open CloudShell and run the commands to finish cleaning up:

aws cloudformation delete-stack --stack-name unicornstore-stack

The deletion of the stack can take about 20 minutes.

Delete the S3 bucket that you used to deploy AWS CloudFormation template. Check the remained resources and stacks and delete them manually if necessary.

Conclusion

In this post we demonstrated the use of async-profiler with HAQM EKS either on-demand or in a continuous profiling mode. We initially set up the infrastructure with an EKS cluster and instrumented the UnicornStore Java application container image with async-profiler. We have built and uploaded the container image to HAQM ECR, deployed it to the EKS cluster, and run on-demand profiling under the load. Moreover, we created an HAQM S3 bucket and connected it to persistent volumes in the EKS cluster using Mountpoints for HAQM S3 with the corresponding CSI driver. After successful deployment of a pod based on the created container image, the results of the continuous profiling of the Java application were stored in an S3 bucket.

With the help of async-profiler we found a bottleneck in the Java application and eliminated it. We also created a solution that helps to continuously create profiling data for the further analysis.

If you want to dive deeper in the internals of profiling with async-profiler, then we recommend this three hour playlist to learn about all of the features.

We hope we have given you some ideas on how you can profile your existing Java application using async-profiler. Feel free to submit enhancements to the sample application in the source repository.

Containers