The ability to monitor and manage workloads in real-time is a foundational requirement for ensuring that you can meet your resilience objectives. Having visibility into key user activities and the performance of critical business functions, enables you to automate responses to events that can impact business operations. Effective monitoring is crucial for not only achieving operational integrity but also managing costs. For example, running unnecessary servers or launching resource-intensive systems without oversight can significantly increase costs. To mitigate these risks, it is crucial to implement a comprehensive monitoring strategy that tracks resource usage, monitors key performance indicators, and provides real-time notifications to support informed business decision-making.
AWS Chatbot allows you to monitor and respond to operational events in the AWS Cloud. You can integrate AWS Chatbot with Slack channels to receive real-time notifications about events and actions operating on your workload and changes in your infrastructure. AWS Elastic Disaster Recovery minimizes downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications using affordable storage, minimal compute, and point-in-time recovery. By using AWS Chatbot to monitor our Elastic Disaster Recovery environment, we can provide the ability for operations teams to quickly respond to potential issues before they become major problems.
In this post we walk you through the process of integrating AWS Chatbot with Slack to receive real-time notifications about critical events related to your resources protected by Elastic Disaster Recovery. This solution allows you to proactively monitor your disaster recovery environment, identify potential issues early, and respond quickly to these events. By having a real-time monitoring, we can improve our overall resilience posture and ensure that we can meet our business continuity objectives.
Solution overview
In this post, we are protecting an EC2 instance running in eu-west-1 region to eu-west-2 region with Elastic Disaster Recovery service. We will start by creating an Amazon Simple Notification Service (Amazon SNS) topic to monitor our Elastic Disaster Recovery environment. We then create a Slack channel to receive real-time notifications from AWS Chatbot. In order to integrate Slack channels with AWS Chatbot, we will need to configure Slack to receive notifications from AWS and subsequently configure AWS Chatbot to send messages to Slack. Once complete, we will then configure Amazon CloudWatch rules to receive real-time notifications about actions and events operating on your workload. In this scenario, we create CloudWatch Events rules to track Elastic Disaster Recovery mutating APIs, such as StartRecovery, and CloudWatch Events such as “DRS Source Server Data Replication Stalled Change.” The overall solution is shown in the following figure 1.
Figure 1: Solution overview
Prerequisites
The following prerequisite is necessary to complete this solution:
- Elastic Disaster Recovery service must be initialized in the AWS Region you decide to failover to during a planned or unplanned event.
- You have a source server being protected with Elastic Disaster Recovery. Refer to this blog for specific guidance on setting up Elastic Disaster Recovery for a cross region use case.
- You have access to Slack and permissions to create a channel and integrate with AWS Chatbot.
Walkthrough
The following summarized the high level steps required for this solution.
- Create an Amazon SNS Topic
- Setup a dedicated Slack Channel
- Integrate Slack Channel with AWS ChatbotConfigure Slack with AWS Chatbot
- Configure AWS Chatbot with Slack
- Create Amazon CloudWatch Rules
- Test Notifications
1. Create an Amazon SNS topic
Create an Amazon SNS topic in the Region where you want to monitor the Elastic Disaster Recovery service environment. In this example, I create a topic with the name “DRS” in the eu-west-1 Region.
To create an SNS topic:
1.1. Sign in to the Amazon SNS console.
1.2. On the navigation panel, choose Topics.
1.3. On the Topics page, choose Create topic.
1.4. On the Create topic page, in the Details section, do the following:
1.4.1 For Type, choose Standard topic type.
1.4.2. Enter a Name for the topic.
1.4.3. (Optional) Enter a Display name for the topic.
1.5 Skip all other options and choose Create topic. If required, you can further customize the topic based on your organization’s policies. Please refer to the documentation page to learn more about creating an Amazon SNS topic.
The topic is created and the MyTopic page is displayed. The topic’s Name, ARN, (optional) Display name, and Topic owner‘s AWS account ID are displayed in the Details section, as shown in the following figure 2.
Figure 2: SNS topic details page
2. Creating a Slack channel
2.1. Create a new Slack channel or use an existing channel to receive notifications. Follow this guide to create a Slack channel. For this example, I create the drs-slack-notifications Slack channel, and set the Visibility to Private, as shown in the following figure 3.
Figure 3: Creating Slack channel
3. Integrate Slack channel with AWS Chatbot
To allow AWS Chatbot to send notifications, you must configure AWS Chatbot with Slack.
3.1 Configure a Slack channel to received notifications from AWS Chatbot
3.1.1. In your Slack channel, enter “@aws” and hit the Enter key on your keyboard. The following message appears in the channel with the options Invite Them and Do Nothing as shown in the following figure 4.
Figure 4: Invite AWS Chatbot in Slack
3.1.2. Choose Invite Them.
3.1.3. You will receive a notification: “aws was added to <slack channel name> by <user name>”, as shown in the following Figure 5.
Figure 5: Integrate Slack channel with AWS Chatbot
3.2. Configure AWS Chatbot to send notifications to a Slack channel
3.2.1. Open the AWS Chatbot console and choose Configure new client. Under Configure a chat client, choose Slack, and then choose Configure client, as shown in the following figure 6.
Figure 6: Choosing Slack as a chat client
3.2.2. In this step, you will need to choose your Slack workspace. Choose your workspace and select Allow to enable AWS Chatbot to access your AWS Slack workspace, as shown in the following figure 7.
Figure 7: Allow AWS Chatbot to access Slack workspace
If you aren’t logged in to Slack in your web browser, sign in to Slack first and make sure the right workspace is selected from the dropdown in the top-right corner of your web browser.
After you authorize AWS Chatbot to access your Slack workspace, a green bar with the message “Slack successfully authorized AWS Chatbot.” appears on the top of your AWS console, as shown in the following figure 8.
Figure 8: Authorize AWS Chatbot to access Slack workspace
3.2.3. On the Workspace details page in the AWS Chatbot console, choose Configure new channel. Under the Configuration details section, provide a name for the configuration to help you easily identify it in the AWS console.
In this example, I provide the configuration name as aws-drs-slack-notifications, as shown in the following figure 9.
Figure 9: Providing configuration name
3.2.4. If you want to enable logging for this configuration, then choose Publish logs to Amazon CloudWatch Logs. You can choose from Error only and All events based on your requirements.
With CloudWatch Logs for AWS Chatbot, you can see all the events handled by AWS Chatbot. You can also see details of any error that may have prevented a notification from appearing in your Slack chat room, as shown in the following figure 10.
Figure 10: Publishing logs to CloudWatch Logs
There is an additional charge for using CloudWatch Logs. For more details, see Amazon CloudWatch Pricing.
3.2.5. For Slack channel, choose the channel created previously in the walkthrough within Slack. AWS Chatbot supports both public and private channels.
To find the Slack Channel ID, right-click on the channel name in the left pane of Slack and choose View channel details. The channel ID is displayed at the bottom of the window, as shown in the following figure 11.
Figure 11: Viewing Slack channel ID
Copy your Slack channel ID and paste it in the Channel ID text box under the Slack channel section of AWS Chatbot as shown in Figure 12.
Figure 12: Slack channel ID in AWS Chatbot
3.2.6. Under the Permissions section, choose the permissions for channel members. AWS Command Line Interface (AWS CLI) commands can also be executed in the Slack channel, thus you can either choose Channel role if all the channel members need the same set of permissions, or choose User-level roles if the channel members need different permissions, as shown in the following figure 13.
3.2.6.1. For this walkthrough, I choose Channel role and choose Create an IAM role using a template under the Channel role dropdown list.
3.2.6.2. In the Role name box, provide a name to the AWS Identity and Access Management (IAM) role you want to create. For this walkthrough, I provided aws-drs-slack-role as the role name.
3.2.6.3. For Policy templates, Notification permissions and Resource Explorer Permissions templates are chosen by default. You can choose any other templates you want to use, such as Read-only command permissions templates. Choosing the policy templates results in the creation of IAM permission in your account. These policies are attached to the channel role specified in the previous step.
3.2.6.4. For Channel guardrail policies, you can choose up to five guardrail policies to secure your channel configuration. By default, ReadOnlyAccess guardrail policy is chosen. This policy defines Get, List, and Describe permissions for the entire suite of AWS services, enabling AWS Chatbot to use this role to access any of those services on your behalf.
Guardrail policies provide detailed control over what actions are available to your channel members and what actions AWS Chatbot can perform on your behalf. They constrain and take precedence over both user roles and channel roles. For example, if a user has a user role that allows administrator access, and they belong to a channel where the channel role or the guardrail policies limit permissions on one or more services, the user has less than administrator-level access.
Figure 13: Creating Channel role and selecting permissions for the role
3.2.7. Under Notifications – optional, choose the SNS topic created in the beginning of this walkthrough and choose your AWS region as shown in the following figure 14. You can add multiple AWS Regions provided that you have an SNS topic created in each AWS Region to monitor multiple AWS Regions.
Figure 14: SNS notifications
3.2.8. Under Tags, add a custom tag to the Slack client. For this walkthrough, I add two tags, as shown in the following figure 15.
Figure 15: Adding tags to channel
3.2.9. Finally, choose Configure to create client configuration.
The client configuration is complete. A success message You successfully configured the Slack channel. should appear in the green bar in the AWS console, as shown in the following figure 16.
Figure 16: Configure Slack client with Slack channel
You should see the following APIs executed in the following Regions as shown in Table 1. This can help troubleshoot any potential issues that arise during the exercise.
Region |
us-east-2 |
us-east-1 |
eu-west-1 |
APIs | CreateSlackChannelConfiguration GetAccountPreferences |
CreateRole CreatePolicy AttachRolePolicy |
Subscribe |
Table 1: API to regions mapping
The subscribe API is shown in eu-west-1 because I chose this Region for this example.
4. Create Amazon CloudWatch rules
To receive Slack notifications about the mutating API actions of Elastic Disaster Recovery service, we will need to create a CloudWatch Rule in the AWS Region the Elastic Disaster Recovery is configured.
4.1. Configure a CloudWatch rule
4.1.1. Open the Amazon EventBridge console. In the navigation pane, choose Rules. Choose Create rule.
4.1.2 Enter a Name and, optionally, a Description for the rule.
4.1.3. For Event bus, choose default event bus.
4.1.4. Keeping the toggle button Enable the rule on the selected event bus on and Rule with an event pattern rule type chosen, choose Next, as shown in the following figure 17.
Figure 17: Defining CloudWatch rule details
4.2. Build an event pattern
4.2.1. Under the Build event pattern page, choose Other as Event source. Ignore the Sample event – optional section and proceed to the next section.
4.2.2. In Creation method section, choose the Custom pattern (JSON editor) option, as shown in the following figure 18.
Figure 18: Choosing CloudWatch Event pattern and Creation method
4.2.3. Under the Event pattern section, add the Elastic Disaster Recovery service APIs for which you want to receive notifications, and choose Next, as shown in the following figure 19. For this example, the custom event pattern in JSON is shown below:
{
"source": ["aws.drs"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventSource": ["drs.amazonaws.com"],
"eventName": ["InitializeService", "CreateSourceServerForDrs", "DisconnectSourceServer", "StartRecovery", "StartReplication", "StopFailback", "DeleteSourceServer", "ReverseReplication", "StartFailbackLaunch", "DisconnectRecoveryInstance", "DeleteRecoveryInstance"]
}
}
Figure 19: Adding CloudWatch Event pattern
To know about all the Elastic Disaster Recovery service APIs, visit this Elastic Disaster Recovery documentation.
4.2.4. Under the Select target(s) section, for Target types, choose AWS service as Target 1. In the Select a target dropdown, choose SNS topic as a target. In the Topic dropdown, choose the topic name you created in step 1 of this walkthrough. Ignore the Additional settings section, and choose? Diagram?
4.2.5. Enter any desired tags for the rule, then choose Next.
4.2.6. Review your rule and choose Create rule to create the rule.
4.2.7. Follow these steps again and create another CloudWatch Rule for receiving notifications about various CloudWatch Events supported by the Elastic Disaster Recovery service. To know more about the supported CloudWatch Events supported by the Elastic Disaster Recovery service, visit this Elastic Disaster Recovery user guide for more information.
In this example, I choose the CloudWatch Event related to the stalled data replication of a source server along with other events. Data replication can stall due to various factors, including:
- Network connectivity issues: This may involve misconfigured security groups, network ACLs, or route tables that prevent communication between the source server and the staging subnet.
- AWS Replication Agent issues: The AWS Replication Agent for Elastic Disaster Recovery may not be running correctly on the source server due to problems on the source server itself.
- IAM permission problems: The replication server may lack the necessary IAM permissions, such as a missing IAM role, permission policies, or trust relationship policy.
Data replication stalls can disrupt business continuity. When this occurs, replication from your source servers to the Elastic Disaster Recovery replication servers ceases, which means that the latest data isn’t being replicated. This is indicated by a Stalled status in the Elastic Disaster Recovery console’s Data Replication Status column, needing immediate user attention.
Maintaining continuous data replication is crucial for business continuity. Monitoring CloudWatch Events for stalled replication allows you to proactively identify and resolve these issues, swiftly restoring continuous data replication to your servers.
In this example, I add the following JSON under Event pattern section to monitor CloudWatch Events:
{
"source": ["aws.drs"],
"detail-type": ["DRS Source Server Data Replication Stalled Change", "DRS Recovery Instance Failback State Change", "DRS Source Server Launch Result"]
}
Follow the remaining steps to complete the rule. Use the same SNS topic created in the beginning of this walkthrough as the target.
5. Test the notifications
To test the Elastic Disaster Recovery service notifications in your Slack channel, proceed to install AWS Replication Agent for Elastic Disaster Recovery on one of your source servers. To learn how to install AWS Replication Agent on a server, visit this Elastic Disaster Recovery user guide.
When the AWS Replication Agent installation is successful, you receive a notification in the Slack channel for the CreateSourceServerForDrs API with additional details.
The following figure 20 shows a sample screenshot.
Figure 20: Slack notification about successful installation of Elastic Disaster Recovery service agent
Cleaning up
To minimize unnecessary AWS costs, delete any resources you’ve created, including Amazon Elastic Compute Cloud (Amazon EC2) instances, Elastic Disaster Recovery source servers, CloudWatch Event rules, Slack configuration clients in AWS Chatbot, and SNS topics after completing your exercise. Leaving these resources running can result in unexpected charges on your AWS bill, even if they’re not in use. Make sure to review all provisioned resources, and terminate any that are no longer needed.
Conclusion
In this post, we guided you through the steps to integrate AWS Chatbot with Slack, enabling real-time notifications for critical events related to your resources protected by AWS Elastic Disaster Recovery service.
Real-time notifications about critical workload activities are crucial when making sure that your business processes meet the required Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). In a disaster recovery scenario, every second counts, and the ability to instantly detect and respond to disruptions to the critical workload can make the difference between minimizing downtime and data loss. Elastic Disaster Recovery service helps organizations maintain business continuity by quickly recovering applications and data to their most recent state. Receiving real-time notifications allows the business operations teams to stay informed about critical events as they occur, thereby taking swift action, aligning with RPO and RTO targets, and making sure that their recovery strategy remains on track.
Integrating AWS Chatbot with CloudWatch Events rules significantly enhances your monitoring capabilities within the AWS environment. AWS Chatbot allows you to receive real-time notifications directly in your preferred communication channels, such as Slack, thus enabling your team to stay informed and respond quickly to critical events. Coupled with CloudWatch Event rules, you can automate the tracking of specific actions and states within your AWS services, such as data replication changes to your Elastic Disaster Recovery source servers.
Visit the Elastic Disaster Recovery page to get started and explore case studies of users who have are using Elastic Disaster Recovery service for disaster recovery of their workloads.