AWS Database Blog
Understanding resource distribution and performance analysis using AWS DMS enhanced monitoring
When using AWS Database Migration Service (AWS DMS), replication lags, task stalls, or resource bottlenecks can occur—and identifying the root cause quickly can become critical.
Although AWS DMS provides HAQM CloudWatch metrics, sometimes information must be correlated across multiple tasks. Without a consolidated view, issue resolution can be delayed. This is where the enhanced monitoring dashboard becomes a valuable feature.
The enhanced monitoring dashboard is a comprehensive monitoring tool that provides visibility into critical metrics for database migration tasks and replication instances. It offers two main views: Tasks and Replication Instance—that display various performance metrics, resource utilization, and status information through intuitive visualizations and widgets. This overview of your AWS DMS landscape is available at no additional cost.
In this post, we discuss some use cases showcasing how you can use the enhanced monitoring dashboard.
Enhanced monitoring dashboard overview
In this section, we provide a breakdown of the different views available on the enhanced monitoring dashboard.
In the following screenshot, you can see the number of resources configured in the us-east-1
AWS Region along with the sections CloudWatch alarms and Service health.
You can also see the Task status section to get a breakdown of the status of the tasks.
Additionally, you can deep dive into the CloudWatch logs by accessing the log streams, as shown in the following screenshot.
In the following sections, we present use cases based on customer interactions to demonstrate how to use the enhanced monitoring dashboard.
Understanding resource distribution analysis
You can either run each task on a dedicated replication instance or run multiple AWS DMS tasks on a single replication instance. Understanding how various task settings and customizations influence your migration is beneficial to make sure the AWS DMS replication is suitably provisioned to handle the workload. With the enhanced monitoring dashboard, you can understand the distribution of memory across various AWS DMS tasks and then choose to distribute the workload by moving your task to a different replication instance or consolidate the workload by modifying the task.
To illustrate this, we spun up a dms.r6i.large replication instance and created three AWS DMS tasks using the option Migrate existing data and replicate ongoing changes and having different task settings.
The following screenshot shows that the task dmstaskflcdc
is consuming more memory compared to the other two tasks. We can then decide on whether to move the task dmstaskflcdc
to its own dedicated replication instance or scale up the underlying replication instance if we decide to run more tasks on the same instance class in the near future.
Performance troubleshooting
While comparing the CloudWatch metrics, you can add widgets to understand and troubleshoot pain points during a migration.
To illustrate this, we created an AWS DMS task using the option Replicate data changes from HAQM Relational Database Service (HAQM RDS) for SQL Server to HAQM Kinesis. While the task is running, we inserted some data on the source and found the following messages in the CloudWatch logs:
This warning indicates that the target can’t keep up with the rate at which the data is getting ingested at the source.
To better understand this, we can add the pertaining widgets related to task metrics (CDCLatencyTarget
, CDCLatencySource
, and CDCChangesDiskTarget
) to see that the changes are accumulating on the underlying storage of the AWS DMS replication instance and waiting to be committed.
One possible cause could be that sufficient shards weren’t provisioned on the Kinesis stream. After increasing the shards on Kinesis, we can see near real-time replication again.
Benchmarking performance
You can perform benchmarking across different tasks and then compare the metrics to understand if the changes are reflecting in performance. For instance, the following example shows the full load CloudWatch metrics as we migrate 60 million records from a table in an RDS for SQL Server instance to an HAQM Aurora PostgreSQL-Compatible Edition cluster.
We compared two tasks: dmsfullloadtest
with default settings and dmsfullloadtest-2
with maxfullloadsubtasks set to 16. This helps us understand how the MaxFullLoadSubTasks
settings can impact the throughput during full load.
The following screenshot shows that with the default settings, the dmsfullloadtest
task achieved a throughput of 235,722 rows per second.
However, by increasing MaxFullLoadSubTasks
to 16 for the dmsfullloadtest-2
task, the throughput improved significantly to 515,599 rows per second.
This benchmarking exercise demonstrates the value of the enhanced monitoring dashboard in helping you optimize your AWS DMS configurations for maximum performance during full load migrations.
Conclusion
In this post, we discussed a few use cases wherein you can use enhanced monitoring. With enhanced monitoring, you can complement your existing AWS DMS monitoring setup and exercise better control on monitoring and have visibility into key metrics for monitoring your tasks and replication instances. For more details, refer to Enhanced monitoring dashboard.