AWS Open Source Blog

Introducing Qonto’s Prometheus RDS Exporter – An Open Source Solution to Enhance Monitoring HAQM RDS

Databases are a critical part of most applications and essential to business continuity. To ensure performance, availability, and scalability, HAQM Relational Database Service (HAQM RDS) administrators typically monitor various metrics, such as the usage of CPU, RAM, IOPS, storage, or service quotas. Today, these metrics are found in several AWS services such as HAQM CloudWatch metrics, HAQM Elastic Compute Cloud (HAQM EC2), HAQM RDS, or Service Quotas. Medium to large scale companies usually have tens or hundreds of databases to monitor. Having a standardized approach to database monitoring can help administrators save time and help scale the business with lower risk. In November 2023, the Qonto SRE team published a unified framework for HAQM RDS monitoring which helps them deploy best practices at scale and monitor hundreds of databases with limited effort.

Qonto is a leading payment institution that offers a panel of banking services to small businesses with simplicity in mind. More than 450,000 companies have used Qonto in 2024. Qonto helps entrepreneurs focus on what matters the most for them: their core business. They have created automated tools that help companies motor through their accounting and expenses.

In this blog, you will learn how Qonto created the Prometheus RDS Exporter for HAQM RDS monitoring and why they decided to share it with the open source community under an MIT license. Qonto was looking for a solution to aggregate HAQM RDS key metrics and push them into Prometheus for monitoring and alerting purposes. This solution is RDS engine agnostic.

Since December 1, 2024, HAQM Aurora PostgreSQL-compatible and HAQM Aurora MySQL-compatible users can leverage HAQM Cloudwatch Database Insights. It is a database observability solution that provides a curated experience designed for DevOps engineers, application developers, and database administrators (DBAs) to expedite database troubleshooting and gain a holistic view into their database fleet health.

Finally, HAQM CloudWatch Database Insights is a fully managed solution while Prometheus RDS Exporter is a self-hosted open source observabillity solution. In this blog, we will demonstrate how to set up the Prometheus RDS Exporter.

Overview of solution

The Prometheus RDS Exporter combines four different AWS APIs:

  • HAQM RDS to collect instance inventory and settings
  • HAQM CloudWatch to collect instance consumption metrics
  • HAQM EC2 to collect physical instance capacity (e.g., number of vCPU, max IOPS, etc.)
  • Service Quotas to keep track and anticipate limit exceeded errors (e.g., available storage).

For metrics that are not available as an API, Qonto integrated AWS logic. For instance, finding the disk IOPS limit requires business logic as shown here:

switch storageType {
     case "gp2":
          iops = ThresholdValue(gp2IOPSMin, allocatedStorage*gp2IOPSPerGB, gp2IOPSMax)
          if allocatedStorage >= gp2StorageThroughputVolumeThreshold {
                storageThroughput = gp2StorageThroughputLargeVolume
          } else {
                storageThroughput = gp2StorageThroughputSmallVolume
          }
     case "gp3":
          storageThroughput = rawStorageThroughput
     case "io1":
          switch {
          case iops >= io1HighIOPSThroughputThreshold:
                storageThroughput = io1HighIOPSThroughputValue
          case iops >= io1LargeIOPSThroughputThreshold:
                storageThroughput = converter.KiloByteToMegaBytes(iops * io1LargeIOPSThroughputValue)
          case iops >= io1MediumIOPSThroughputThreshold:
                storageThroughput = io1MediumIOPSThroughputValue
          default:
                storageThroughput = converter.KiloByteToMegaBytes(iops * io1DefaultIOPSThroughputValue)
          }
}

The inventory, consumption, settings and quotas metrics have been consolidated into Qonto’s Prometheus RDS Exporter open source project. For a comprehensive view, the project also includes Qonto’s Grafana dashboards to fully leverage all these metrics and quickly visualize any and all issues with your HAQM RDS instances at a glance.

For a fully integrated solution suitable for production, which has been meticulously crafted by SREs, Qonto’s Database Monitoring Framework provides the 30 alerts that all HAQM RDS customers should enable, with documented runbooks explaining how to handle alerts.

Solution

The solution targets container environments, but could be deployed with HAQM EC2 instances as well. In this blog post, we’ll deploy the solution in a Kubernetes environment, assuming the HAQM Elastic Kubernetes Service (HAQM EKS) cluster, Prometheus operator, and Grafana are already deployed.

Prerequisites:

Step 1: Create an AWS Identity and Access Management (IAM) policy

Through your dev environment and with AWS Command Line Interface (AWS CLI), create an IAM policy with required permissions to fetch the different AWS APIs.

IAM_POLICY_NAME=prometheus-rds-exporter

# Download policy payload
curl --fail --silent --write-out "Reponse code: %{response_code}\n" http://raw.githubusercontent.com/qonto/prometheus-rds-exporter/main/configs/aws/policy.json -o /tmp/prometheus-rds-exporter.policy.json

# Create IAM policy
aws iam create-policy --policy-name ${IAM_POLICY_NAME} --policy-document file:///tmp/prometheus-rds-exporter.policy.json

Step 2: Create an IAM role and Kubernetes service account

The Kubernetes pod running the Prometheus RDS Exporter will use a Kubernetes Service Account to use the IAM policy established in the preceding step. As a security best practice, the exporter will use IAM Roles for Service Accounts (IRSA) to pass IAM credentials to the RDS exporter. Create an IAM role and a Kubernetes Service Account for the Prometheus RDS Exporter:

EKS_CLUSTER_NAME=default # Replace with your EKS cluster name
KUBERNETES_NAMESPACE=monitoring # Replace with the namespace that you want to use
IAM_ROLE_NAME=prometheus-rds-exporter
KUBERNETES_SERVICE_ACCOUNT_NAME=prometheus-rds-exporter
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
 
eksctl create iamserviceaccount \
--cluster ${EKS_CLUSTER_NAME} \
--namespace ${KUBERNETES_NAMESPACE} \
--name ${KUBERNETES_SERVICE_ACCOUNT_NAME} \
--role-name ${IAM_ROLE_NAME} \
--attach-policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${IAM_POLICY_NAME} \
--approve

This is the default role for the Prometheus RDS Exporter:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowInstanceAndLogDescriptions",
            "Effect": "Allow",
            "Action": [
                "rds:DescribeDBInstances",
                "rds:DescribeDBLogFiles"
            ],
            "Resource": [
                "arn:aws:rds:*:*:db:*"
            ]
        },
        {
            "Sid": "AllowMaintenanceDescriptions",
            "Effect": "Allow",
            "Action": [
                "rds:DescribePendingMaintenanceActions"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowGettingCloudWatchMetrics",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowRDSUsageDescriptions",
            "Effect": "Allow",
            "Action": [
                "rds:DescribeAccountAttributes"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowQuotaDescriptions",
            "Effect": "Allow",
            "Action": [
                "servicequotas:GetServiceQuota"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowInstanceTypeDescriptions",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstanceTypes"
            ],
            "Resource": "*"
        }
    ]
}

Step 3: Deploy the Prometheus RDS Exporter

Next, deploy the Prometheus RDS Exporter using the official Helm chart available in the HAQM Elastic Container Registry (HAQM ECR) public gallery. The Helm chart deploys the Prometheus RDS Exporter and a Prometheus ServiceMonitor custom resource that instructs the Prometheus server to auto discover the Prometheus RDS Exporter and collect its metrics. Deploy the Prometheus RDS Exporter with the following commands:

PROMETHEUS_RDS_EXPORTER_VERSION=0.3.0 # Replace with latest version
SERVICE_ACCOUNT_ANNOTATION="arn:aws:iam::${AWS_ACCOUNT_ID}:role/${IAM_ROLE_NAME}"

helm upgrade \
prometheus-rds-exporter \
oci://public.ecr.aws/qonto/prometheus-rds-exporter-chart \
--version ${PROMETHEUS_RDS_EXPORTER_VERSION} \
--install \
--namespace ${KUBERNETES_NAMESPACE} \
--set serviceAccount.annotations."eks\.amazonaws\.com\/role-arn"="${SERVICE_ACCOUNT_ANNOTATION}"

After a few minutes, HAQM RDS metrics should be available in your Prometheus server. From the Prometheus server CLI, you can execute the following Prometheus query to see the first metrics:

rds_instance_info{}

You can also navigate to the graphical interface located on http://localhost:9090/

Step 4: Install Grafana dashboards

On your Grafana deployment, in order to visualize the HAQM RDS metrics, you have to import the following Grafana dashboards:

You can import preconfigured dashboards into your Grafana instance or cloud stack using the UI or the HTTP API.

In order to import it through the UI, follow these steps:

  • Click Dashboards in the primary menu.
  • Click New and select Import in the drop-down menu.
  • Perform one of the following steps:
    • Upload a dashboard JSON file.
    • Paste a dashboard URL above or ID into the field provided.
    • Paste dashboard JSON text directly into the text area.
  • (Optional) Change the dashboard name, folder, or UID, and specify metric prefixes, if the dashboard uses any.
  • Select a data source, if required.
  • Click Import.
  • Save the dashboard.

The following are examples of the metrics available within the Grafana dashboards:

HAQM RDS inventory

Use the RDS inventory dashboard to visualize HAQM RDS instances with pending maintenance or modifications.

RDS inventory screenshot

HAQM RDS instance overview

Use the RDS details dashboard to see HAQM RDS instance resources usage via the USE method.

HAQM RDS instance resources dashboard

Step 5: Install HAQM RDS alerts

After gathering metrics, we can now visualize them and proceed to activate the alerts. Qonto’s Database monitoring framework contains:

  • 30 recommended alerts for HAQM RDS
  • A Helm chart to deploy Prometheus alerts
  • Documented runbooks to handle HAQM RDS alerts.

Install the RDS alerts helm chart containing the recommended HAQM RDS alerts. These alerts are defined as PrometheusRule CRD and will be automatically detected by Prometheus.

helm upgrade --install prometheus-rds-alerts-chart oci://public.ecr.aws/qonto/prometheus-rds-alerts-chart:latest --namespace ${KUBERNETES_NAMESPACE}

By default, the deployment includes 30 predefined alerts for HAQM RDS. To customize alerts (e.g. adjust alert threshold to your database workload), you can find a list of available configuration options in the Helm configuration file.

Clean-up

1. Uninstall the Prometheus alerts:

helm uninstall prometheus-rds-alerts-chart --namespace ${KUBERNETES_NAMESPACE}

2. Uninstall the Prometheus RDS Exporter:

helm uninstall prometheus-rds-exporter --namespace ${KUBERNETES_NAMESPACE}

3. Delete the Kubernetes Service Account used by the Prometheus RDS Exporter:

kubectl delete serviceaccount ${KUBERNETES_SERVICE_ACCOUNT_NAME} --namespace ${KUBERNETES_NAMESPACE}

4. Delete the Kubernetes Service Account and the IAM Role associated with this service account

eksctl delete iamserviceaccount --cluster ${EKS_CLUSTER_NAME} --namespace ${KUBERNETES_NAMESPACE} --name ${KUBERNETES_SERVICE_ACCOUNT_NAME}

5. Delete the IAM Policy

aws iam delete-policy --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${IAM_POLICY_NAME}

6. Remove Grafana Dashboards:

Finally, navigate to your Grafana instance UI, go to Dashboards > Manage and select the dashboards you imported and delete them.

Conclusion

Having a unified approach to database monitoring can be beneficial to database administrators with tens or hundreds of databases, saving time and deploying best practices at scale. Using Qonto’s framework, which is based on AWS services and their years of experience, you can improve your observability tooling today and help your business scale further with lower risk.

PostgreSQL users might also be interested in deploying the PostgreSQL alerts.

Vivien de Saint Pern

Vivien de Saint Pern

Vivien de Saint Pern is a Senior Solution Architect at AWS, former CTO of a startup based in Paris. With more than 20 years experience in the IT industry, he focuses on helping customers build highly scalable and resilient workloads in the cloud.

Camille Hoarau

Camille Hoarau

Camille Hoarau is a Senior Technical Account Manager at AWS based in Volvic. He helps customers getting the best from their AWS workloads. Seeking for optimization, constantly challenging the statu quo to raise the bar are part of his daily job.

Damien Cupif

Damien Cupif

Damien Cupif is a senior SRE engineer at Qonto who specializes in storage systems. Drawing from his extensive data and SRE expertise, he shapes Qonto's data-driven approach to database ecosystem management and evolution.

Vincent Mercier

Vincent Mercier

Vincent Mercier is Solution Architect in the AWS startup team. For 15 years, he has been assisting scale-ups in building efficient and resilient technical infrastructures. With deep expertise in platform supervision and observability, he help companies achieve high availability and operational excellence.