AWS Open Source Blog

Configuring Grafana Cloud Agent for HAQM Managed Service for Prometheus

This post was written by Robert Fratto, Imaya Kumar Jagannathan, and Alolita Sharma.

The Grafana Cloud Agent is a lightweight alternative to running a full Prometheus server. It keeps the necessary parts for discovering and scraping Prometheus exporters and sending metrics to the backend, which in this case is the HAQM Managed Service for Prometheus (AMP), removing subsystems such as the storage, query, and alerting engines. Grafana Cloud Agent is 100% compatible with Prometheus metrics and uses the Prometheus Service Discovery, Scraping, Write-Ahead Log, and Remote Write mechanisms from the Prometheus project. Grafana Cloud Agent also supports basic sharding across every node in your HAQM Elastic Kubernetes Service (HAQM EKS) cluster by only collecting metrics running on the same node as the Grafana Cloud Agent pod, removing the need to decide between one giant machine to collect all of your Prometheus metrics and sharding through multiple manually managed Prometheus configurations. The Grafana Cloud Agent also includes native support for AWS Signature Version 4 (SIGV4) for IAM authentication, which means that there is no longer a need to run a sidecar SIGV4 proxy, reducing complexity, memory, and CPU demand.

In this blog post, we will walk through the steps to configure an AWS Identity and Access Management (IAM) role to send Prometheus metrics to the HAQM Managed Service for Prometheus. We will then install Grafana Cloud Agent on your HAQM Elastic Kubernetes Service cluster and forward metrics to AMP. We assume that you already have an AMP workspace configured in your environment. To learn about creating a new AMP workspace, refer to the documentation.

Configuring permissions

The Grafana Cloud Agent scrapes operational metrics from containerized workloads running in the HAQM EKS cluster and sends them to AMP for long-term storage as well as for subsequent querying by monitoring tools such as Grafana. Data sent to AMP must be signed with valid AWS credentials using the AWS Signature Version 4 algorithm to authenticate and authorize each client request for the managed service.

The Grafana Cloud Agent can be deployed to an HAQM EKS cluster to run under the identity of a Kubernetes service account. With IAM roles for service accounts (IRSA), you can associate an IAM role with a Kubernetes service account and thus provide IAM permissions to any pod that uses that service account. This follows the principle of least privilege by using IRSA to securely configure the Grafana Cloud Agent, which includes the AWS Signature Version 4 that helps ingest Prometheus metrics to AMP.

The agent-permissions-aks shell script can be used to execute the these actions after substituting the placeholder variable YOUR_EKS_CLUSTER_NAME with the name of your HAQM EKS cluster:

  • Creates an IAM role with an IAM policy that has permissions to remote-write into an AMP workspace.
  • Creates a Kubernetes service account that is associated with the IAM role.
  • Creates a trust relationship between the IAM role and the OIDC provider hosted in your HAQM EKS cluster.

The script requires that you have installed the CLI tools kubectl and eksctl and have configured them with access to your HAQM EKS cluster.

The agent-permissions-aks shell script creates an IAM Role named EKS-GrafanaAgent-AMP-ServiceAccount-Role and attaches it to the Kubernetes service account named grafana-agent under the grafana-agent namespace.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "aps:RemoteWrite",
 ],
 "Resource": "*"
 }
 ]
}

Deploying Grafana Cloud Agent

To deploy the Grafana Cloud Agent to your HAQM EKS cluster, run the command below, replacing these placeholders:

  • WORKSPACE: The AMP workspace ID to send metrics to.
  • ROLE_ARN: The ARN of the EKS-GrafanaAgent-AMP-ServiceAccount-Role you created earlier.
  • REGION: The region of the AMP workspace to send metrics to.
kubectl create namespace grafana-agent; \
WORKSPACE="" \
ROLE_ARN="" \
REGION="" \
NAMESPACE="grafana-agent" \
REMOTE_WRITE_URL="http://aps-workspaces.$REGION.amazonaws.com/workspaces/$WORKSPACE/api/v1/remote_write" \
/bin/bash -c "$(curl -fsSL http://raw.githubusercontent.com/grafana/agent/v0.11.0/production/kubernetes/install-sigv4.sh)" | kubectl apply -f -

This script deploys a DaemonSet called grafana-agent and a Deployment (with one replica) called grafana-agent-deployment. The grafana-agent DaemonSet will collect metrics from pods on your cluster, whereas the grafana-agent-deployment will collect metrics from services that do not live on your cluster, such as the HAQM EKS control plane.

After a minute or two, metrics should start being collected and sent to AMP. You can query for metrics using HAQM Managed Grafana that has your AMP workspace as a Prometheus datasource. To verify metrics are being collected, query for up{job=”grafana-agent/grafana-agent”}.

Pods that are scraped by default

By default, the Grafana Cloud Agent will collect metrics from the /metrics endpoint of any HAQM EKS pod that follows the following rules:

  • The pod has a label named name.
  • The pod does not have an annotation of “prometheus.io/scrape”: “false”.
  • The pod has a named port to scrape metrics from, where the named port ends in -metrics.

The Grafana Cloud Agent deployment follows these rules by default, and will therefore scrape itself.

For each discovered pod, the following label will be injected when scraping metrics:

  • The job label will be set to the pod namespace and the value of the name label, separated by a forward slash (/).
  • The pod label will be set to the pod name.
  • The container label will be set to the pod container name.
  • The instance label will be set to the pod name, pod container name, and pod container port name, separated by a colon (:).

Conclusion

The Grafana Cloud Agent makes it easier to collect Prometheus-compatible metrics and scale to distribute scrape load by deploying one process per node. This post outlined the steps involved to install the Grafana Cloud Agent and forward them to a pre-configured HAQM Managed Service for Prometheus workspace.

Visits the websites to learn more about HAQM Managed Service for Prometheus and HAQM Managed Grafana. Please let us know if you have any questions, issues, or enhancements you discover in the steps outlined as you install Grafana Cloud Agent to get started with AMP. And if you’re on Twitter don’t hesitate to drop me a line on any questions or feedback.

Robert Fratto

Robert Fratto

Robert is a Senior Software Engineer at Grafana Labs. He is one of the maintainers of Grafana Loki and the primary maintainer for the Grafana Cloud Agent. When not writing code for work or fun, Robert likes to play with his dog, Data. You can reach him on Twitter @robertfratto

Alolita Sharma

Alolita Sharma

Alolita is a senior manager at AWS where she leads open source observability engineering and collaboration for OpenTelemetry, Prometheus, Cortex, Grafana. Alolita is co-chair of the CNCF Technical Advisory Group for Observability, member of the OpenTelemetry Governance Committee and a board director of the Unicode Consortium. She contributes to open standards at OpenTelemetry, Unicode and W3C. She has served on the boards of the OSI and SFLC.in. Alolita has led engineering teams at Wikipedia, Twitter, PayPal and IBM. Two decades of doing open source continue to inspire her. You can find her on Twitter @alolita.

Imaya Kumar Jagannathan

Imaya Kumar Jagannathan

Imaya Kumar Jagannathan is a Principal Solution Architect focused on AWS Observability services including HAQM CloudWatch, AWS X-Ray, HAQM Managed Service for Prometheus, HAQM Managed Grafana and AWS Distro for Open Telemetry. He is passionate about monitoring and observability and has a strong application development and architecture background. He likes working on distributed systems and is excited to talk about microservice architecture design. He loves programming in C#, working with containers and serverless technologies. LinkedIn: /imaya.