AWS Storage Blog
Real-time monitoring of AWS Elastic Disaster Recovery using HAQM Q Developer
The ability to monitor and manage workloads in real-time is a foundational requirement for ensuring that you can meet your resilience objectives. Having visibility into key user activities and the performance of critical business functions, enables you to automate responses to events that can impact business operations. Effective monitoring is crucial for not only achieving operational integrity but also managing costs. For example, running unnecessary servers or launching resource-intensive systems without oversight can significantly increase costs. To mitigate these risks, it is crucial to implement a comprehensive monitoring strategy that tracks resource usage, monitors key performance indicators, and provides real-time notifications to support informed business decision-making.
HAQM Q Developer allows you to monitor and respond to operational events in HAQM Web Services (AWS). You can integrate HAQM Q Developer with Slack channels to receive real-time notifications about events and actions operating on your workload and changes in your infrastructure. AWS Elastic Disaster Recovery minimizes downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications using affordable storage, minimal compute, and point-in-time recovery. By using HAQM Q Developer to monitor your Elastic Disaster Recovery environment, you can provide the ability for operations teams to quickly respond to potential issues before they become major problems.
In this post we walk you through the process of integrating HAQM Q Developer with Slack to receive real-time notifications about critical events related to your resources protected by Elastic Disaster Recovery. This solution allows you to proactively monitor your disaster recovery environment, identify potential issues early, and respond quickly to these events. By having a real-time monitoring, we can improve your overall resilience posture and ensure that you can meet your business continuity objectives.
Solution overview
In this post, we are protecting an HAQM Elastic Compute Cloud (HAQM EC2) instance running in the eu-west-1 region with Elastic Disaster Recovery in the eu-west-2 region. We will start by creating an HAQM Simple Notification Service (HAQM SNS) topic to monitor our Elastic Disaster Recovery environment. We then create a Slack channel to receive real-time notifications from HAQM Q Developer. In order to integrate Slack channels with HAQM Q Developer, we will need to configure Slack to receive notifications from AWS and subsequently configure HAQM Q Developer to send messages to Slack. Once complete, we will then configure HAQM CloudWatch rules to receive real-time notifications about actions and events operating on your workload. In this scenario, we create CloudWatch Events rules to track Elastic Disaster Recovery mutating APIs, such as StartRecovery, and CloudWatch Events such as “DRS Source Server Data Replication Stalled Change.” The overall solution is shown in the following figure 1.
Figure 1: Solution overview
Prerequisites
The following prerequisites are necessary to complete this solution:
- The Elastic Disaster Recovery service must be initialized in the AWS Region you decide to failover to during a planned or unplanned event.
- You have a source server being protected with Elastic Disaster Recovery. Refer to this blog for specific guidance on setting up Elastic Disaster Recovery for a cross region use case.
- You have access to Slack and permissions to create a channel and integrate it with HAQM Q Developer.
Walkthrough
The following summarizes the high level steps required for this solution.
- Create an HAQM SNS topic
- Setup a dedicated Slack channel
- Integrate Slack with HAQM Q Developer
- Configure HAQM Q Developer with Slack
- Create HAQM CloudWatch rules
- Test notifications
1. Create an HAQM SNS topic
Create an HAQM SNS topic in the Region where you want to monitor the Elastic Disaster Recovery service environment. In this example, I create a topic with the name “DRS” in the eu-west-1 Region.
To create an SNS topic:
1.1. Sign in to the HAQM SNS console.
1.2. On the navigation panel, choose Topics.
1.3. On the Topics page, choose Create topic.
1.4. On the Create topic page, in the Details section, do the following:
1.4.1 For Type, choose Standard topic type.
1.4.2. Enter a Name for the topic.
1.4.3. (Optional) Enter a Display name for the topic.
1.5 Skip all other options and choose Create topic. If required, you can further customize the topic based on your organization’s policies. Please refer to the documentation page to learn more about creating an HAQM SNS topic.
The topic is created and the MyTopic page is displayed. The topic’s Name, ARN, (optional) Display name, and Topic owner‘s AWS account ID are displayed in the Details section, as shown in the following figure 2.
Figure 2: SNS topic details page
2. Create a Slack channel
2.1. Create a new Slack channel or use an existing channel to receive notifications. Follow this guide to create a Slack channel. For this example, I create the drs-slack-notifications Slack channel, and set the Visibility to Private, as shown in the following figure 3.
Figure 3: Creating Slack channel
3. Integrate Slack channel with HAQM Q Developer
To allow HAQM Q Developer to send notifications, you must configure HAQM Q Developer with Slack.
3.1 Configure a Slack channel to received notifications from HAQM Q Developer
3.1.1. In your Slack channel, enter “/invite @HAQM Q” and hit the Enter key on your keyboard as shown in the following Figure 4.
Figure 4: Invite HAQM Q Developer in Slack
Once complete, HAQM Q Developer will have joined Slack channel via invite, as shown in the following figure 5.
Figure 5: HAQM Q Developer invited to Slack channel
3.2. Configure HAQM Q Developer to send notifications to a Slack channel
3.2.1. Open the HAQM Q Developer console and choose Configure new client. Under Configure a chat client, choose Slack, and then choose Configure client, as shown in the following figure 6.
Figure 6: Choosing Slack as a chat client
3.2.2. In this step, you will need to choose your Slack workspace. Choose your workspace and select Allow to enable HAQM Q Developer to access your AWS Slack workspace, as shown in the following figure 7.
Figure 7: Allow HAQM Q Developer to access Slack workspace
If you aren’t logged in to Slack in your web browser, sign in to Slack first and make sure the right workspace is selected from the dropdown in the top-right corner of your web browser.
After you authorize HAQM Q Developer to access your Slack workspace, a green bar with the message “Slack successfully authorized HAQM Q Developer.” appears on the top of your AWS console, as shown in the following figure 8.
Figure 8: Authorize HAQM Q Developer to access Slack workspace
3.2.3. On the Workspace details page in the HAQM Q Developer console, choose Configure new channel. Under the Configuration details section, provide a name for the configuration to help you easily identify it in the AWS console.
In this example, I provide the configuration name as aws-drs-slack-notifications, as shown in the following figure 9.
Figure 9: Providing configuration name
3.2.4. If you want to enable logging for this configuration, then choose Publish logs to HAQM CloudWatch Logs. You can choose between Error only and All events based on your requirements, as shown in figure 10.
With CloudWatch Logs for HAQM Q Developer, you can see all the events handled by HAQM Q Developer. You can also see details of any error that may have prevented a notification from appearing in your Slack chat room.
Figure 10: Publishing logs to CloudWatch Logs
There is an additional charge for using CloudWatch Logs. For more details, see HAQM CloudWatch Pricing.
3.2.5. For Slack channel, choose the channel previously created in this walkthrough within Slack. HAQM Q Developer supports both public and private channels.
To find the Slack Channel ID, right-click on the channel name in the left pane of Slack and choose View channel details. The channel ID is displayed at the bottom of the window, as shown in the following figure 11.
Figure 11: Viewing Slack channel ID
Copy your Slack channel ID and paste it in the Channel ID text box under the Slack channel section of HAQM Q Developer as shown in Figure 12.
Figure 12: Slack channel ID in HAQM Q Developer
3.2.6. Under the Permissions section, choose the permissions for channel members. AWS Command Line Interface (AWS CLI) commands can also be executed in the Slack channel, thus you can either choose Channel role if all the channel members need the same set of permissions, or choose User-level roles if the channel members need different permissions, as shown in the following figure 13.
3.2.6.1. For this walkthrough, I choose Channel role and choose Create an IAM role using a template under the Channel role dropdown list.
3.2.6.2. In the Role name box, provide a name for the AWS Identity and Access Management (IAM) role you want to create. For this walkthrough, I provided aws-drs-slack-role as the role name.
3.2.6.3. For Policy templates, Notification permissions and Resource Explorer Permissions templates are chosen by default. You can choose any other templates you want to use, such as Read-only command permissions templates. Choosing the policy templates results in the creation of IAM permission in your account. These policies are attached to the channel role specified in the previous step.
3.2.6.4. For Channel guardrail policies, you can choose up to five guardrail policies to secure your channel configuration. By default, ReadOnlyAccess guardrail policy is chosen. This policy defines Get, List, and Describe permissions for the entire suite of AWS services, enabling HAQM Q Developer to use this role to access any of those services on your behalf.
Guardrail policies provide detailed control over what actions are available to your channel members and what actions HAQM Q Developer can perform on your behalf. They constrain and take precedence over both user roles and channel roles. For example, if a user has a user role that allows administrator access, and they belong to a channel where the channel role or the guardrail policies limit permissions on one or more services, the user will have the more restrictive permissions, resulting in less than administrator access.
Figure 13: Creating Channel role and selecting permissions for the role
3.2.7. Under Notifications – optional, choose the SNS topic created in the beginning of this walkthrough and choose your AWS region as shown in the following figure 14. You can add multiple AWS Regions provided that you have an SNS topic created in each AWS Region to monitor multiple AWS Regions.
Figure 14: SNS notifications
3.2.8. Under Tags, add a custom tag to the Slack client. For this walkthrough, I add two tags, as shown in the following figure 15.
Figure 15: Adding tags to channel
3.2.9. Finally, choose Configure to create the client configuration.
The client configuration is complete. A success message You successfully configured the Slack channel. should appear in the green bar in the AWS console, as shown in the following figure 16.
Figure 16: Configure Slack client with Slack channel
You should see the following APIs executed in the following Regions as shown in Table 1. This can help troubleshoot any potential issues that arise during the exercise.
Region | us-east-2 |
us-east-1 |
eu-west-1 |
APIs | CreateSlackChannelConfiguration GetAccountPreferences |
CreateRole CreatePolicy AttachRolePolicy |
Subscribe |
Table 1: API to regions mapping
The subscribe API is shown in eu-west-1 because I chose this Region to receive any notifications from the SNS topic.
4. Create HAQM CloudWatch rules
To receive Slack notifications about the mutating API actions of Elastic Disaster Recovery, we will need to create a CloudWatch Rule in the AWS Region the Elastic Disaster Recovery is configured.
4.1. Configure a CloudWatch rule
4.1.1. Open the HAQM EventBridge console. In the navigation pane, choose Rules. Choose Create rule.
4.1.2 Enter a Name and, optionally, a Description for the rule.
4.1.3. For Event bus, choose default event bus.
4.1.4. Keeping the toggle button for Enable the rule on the selected event bus on and Rule with an event pattern rule type chosen, choose Next, as shown in the following figure 17.
Figure 17: Defining CloudWatch rule details
4.2. Build an event pattern
4.2.1. Under the Build event pattern page, choose Other as Event source. Ignore the Sample event – optional section and proceed to the next section.
4.2.2. In Creation method section, choose the Custom pattern (JSON editor) option, as shown in the following figure 18.
Figure 18: Choosing CloudWatch Event pattern and Creation method
4.2.3. Under the Event pattern section, add the Elastic Disaster Recovery service APIs for which you want to receive notifications, and choose Next, as shown in the following figure 19. For this example, the custom event pattern in JSON is shown below:
{
"source": ["aws.drs"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventSource": ["drs.amazonaws.com"],
"eventName": ["InitializeService", "CreateSourceServerForDrs", "DisconnectSourceServer", "StartRecovery", "StartReplication", "StopFailback", "DeleteSourceServer", "ReverseReplication", "StartFailbackLaunch", "DisconnectRecoveryInstance", "DeleteRecoveryInstance"]
}
}
Figure 19: Adding CloudWatch Event pattern
To learn more about all the Elastic Disaster Recovery service APIs, visit this Elastic Disaster Recovery documentation.
4.2.4. Under the Select target(s) section, for Target types, choose AWS service as Target 1. In the Select a target dropdown, choose SNS topic as a target. In the Topic dropdown, choose the topic name you created in step 1 of this walkthrough. Ignore the Additional settings section, and choose Next as shown in the following figure 20.
Figure 20: Selecting SNS targets
4.2.5. Enter any desired tags for the rule, then choose Next.
4.2.6. Review your rule and choose Create rule to create the rule.
4.2.7. Follow these steps again and create another CloudWatch rule for receiving notifications about various CloudWatch Events supported by Elastic Disaster Recovery. To learn more about the supported CloudWatch Events supported by the Elastic Disaster Recovery service, visit this Elastic Disaster Recovery user guide for more information.
In this example, I choose the CloudWatch Event related to stalled data replication of a source server along with other events. Data replication can stall due to various factors, including:
- Network connectivity issues: This may involve misconfigured security groups, network ACLs, or route tables that prevent communication between the source server and the staging subnet.
- AWS Replication Agent issues: The AWS Replication Agent for Elastic Disaster Recovery may not be running correctly on the source server due to problems on the source server itself.
- IAM permission problems: The replication server may lack the necessary IAM permissions, such as a missing IAM role, permission policies, or trust relationship policy.
Maintaining continuous data replication is crucial for business continuity. Monitoring CloudWatch Events for stalled replication allows you to proactively identify and resolve these issues, swiftly restoring continuous data replication to your servers.
In this example, I add the following JSON under Event pattern section to monitor CloudWatch Events:
{
"source": ["aws.drs"],
"detail-type": ["DRS Source Server Data Replication Stalled Change", "DRS Recovery Instance Failback State Change", "DRS Source Server Launch Result"]
}
Follow the remaining steps outlined in section 4.2 to complete the creation of the rule. Use the same SNS topic created in the beginning of this walkthrough as the target.
5. Test the notifications
To test the Elastic Disaster Recovery notifications in your Slack channel, proceed to install the AWS Replication Agent for Elastic Disaster Recovery on one of your source servers. To learn how to install AWS Replication Agent on a server, visit this Elastic Disaster Recovery user guide.
When the AWS Replication Agent installation is successful, you receive a notification in the Slack channel for the CreateSourceServerForDrs API with additional details.
The following figure 21 shows a sample screenshot of the notifications received within Slack.
Figure 21: Slack notification about successful installation of Elastic Disaster Recovery service agent
When data replication stalls occur in Elastic Disaster Recovery it can disrupt business continuity. Replication from your source servers to the replication servers ceases, which means that the latest data isn’t being replicated. This is indicated by a Stalled status in the Elastic Disaster Recovery console’s Data Replication Status column, needing immediate user attention, as shown in the following figure 22.
Figure 22: Elastic Disaster Recovery stalled replication
By integrating CloudWatch Events that monitor for “DRS Source Server Data Replication Stalled Change”, you will now be notified by HAQM Q Developer in your Slack channel, as shown in the following Figure 23.
Cleaning up
To minimize unnecessary AWS costs, delete any resources you’ve created, including HAQM Elastic Compute Cloud (HAQM EC2) instances, Elastic Disaster Recovery source servers, CloudWatch Event rules, the Slack client in HAQM Q Developer, and SNS topics. Leaving these resources running can result in unexpected charges on your AWS bill, even if they’re not in use. Make sure to review all provisioned resources, and terminate any that are no longer needed.
Conclusion
In this post, we guided you through the steps to integrate HAQM Q Developer with Slack, enabling real-time notifications for critical events related to your resources protected by AWS Elastic Disaster Recovery. Elastic Disaster Recovery helps organizations maintain business continuity by quickly recovering applications and data to their most recent state. Real-time notifications for critical workload activities are crucial to ensure applications can meet the required Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). In a disaster recovery scenario, every second counts, and the ability to detect and respond to disruptions can make the difference between meeting these objectives and disappointing users.
Integrating HAQM Q Developer with CloudWatch Events rules significantly enhances your monitoring capabilities for Elastic Disaster Recovery. HAQM Q Developer allows you to receive real-time notifications directly in your preferred communication channels, such as Slack. In this blog, we configured CloudWatch Event rules to automate the tracking of specific actions and events for Elastic Disaster Recovery, such as data replication changes to your source servers.
We hope you find this blog useful and invite your feedback. Visit the Elastic Disaster Recovery page to learn more about the services. You can also explore case studies of users who have are using Elastic Disaster Recovery to protect their workloads on AWS.