AWS Cloud Operations Blog
Tag: Cloud Operations
Using the Fault Tolerance Analyser Tool to Identify Potential Issues
Introduction Ensuring resilience, the ability for a system to recover from a failure induced by load, attacks, and other issues, is a shared responsibility that underpins the reliability of your workloads. While AWS provides the resilient underlying cloud infrastructure, customers are tasked with maintaining the resilience of their applications. In this landscape of joint responsibility, […]
Provision products and raise patch change requests in AWS via ServiceNow
ServiceNow is a popular cloud-based IT Service Management (ITSM) platform. Organizations use ServiceNow to manage incidents, track scheduled and planned infrastructure changes, manage new service requests and track configuration items across IT systems. Common questions I’ve had from customers include how they can use ServiceNow to provision new instances. Or, how to use ServiceNow to […]
Build AWS Systems Manager Automation runbooks using AWS CDK
AWS Systems Manager Automation runbooks let you deploy, configure, and manage AWS resources safely and at scale. You can use AWS-published runbooks or build your own to enable AWS resource management across multiple accounts and regions. The AWS Cloud Development Kit (AWS CDK v2) is an open-source framework that can build applications with the expressive power of […]
Visualizing Resources with Workload Discovery on AWS
Operations Teams (Ops Teams) across enterprises typically rely on documented architecture diagrams to understand the dependencies of various workloads deployed on AWS. As enterprises continue to deploy large-scale multi-tiered workloads, it can become challenging for Ops Teams to track the ever changing relationships between the deployed resources, often meaning that documentation can’t keep up with […]
Level up your Cloud Transformation with Experience-Based Acceleration (EBA)
Introduction For organizations moving to the cloud, fully embracing its benefits is not straightforward. Even with strong management buy-in and approved business cases, executional challenges are common. Do the below challenges resonate with what you are facing now in your cloud journey? No single-threaded owner of cloud initiatives, impacting velocity of decision-making Unable to effectively […]
Using HAQM CloudWatch metrics to monitor time to expiration for Reserved Instances | HAQM Web Services
This post shows you how to monitor the days remaining for HAQM EC2 Reserved Instances. The solution uses a custom HAQM CloudWatch metric published via an AWS Lambda function. It creates a CloudWatch alarm and an HAQM Simple Notification Service (HAQM SNS) topic for notification when the alarm exceeds the user-defined threshold. CloudWatch allows you […]
How to use AWS Well-Architected with AWS Trusted Advisor to achieve data-driven cost optimization
Are you looking for ways to optimize your costs on AWS? Are you ensuring that you are taking advantage of all the cost-saving features and services that AWS offers? If not, you should be! In this blog post, we will discuss how to use AWS Well-Architected and AWS Trusted Advisor to achieve data-driven insights that […]
Use port forwarding in AWS Systems Manager Session Manager to connect to remote hosts
We recently announced a new capability within AWS Systems Manager Session Manager that allows forwarding connections from client machines to ports on remote hosts. This enables users to securely access and manage remote servers (databases, web servers, etc.) in the private networks without needing to setup bastion hosts or open additional ports to the outside […]
Achieving Operational Excellence using automated playbook and runbook
An important aspect of operational readiness is having a well-defined process to perform activities in your workload for various scenarios as indicated in Question 7 of Operational Excellence pillar in AWS Well-Architected Framework. Which aims at evaluating your workload’s readiness for operation, from process and personnel perspective. In the case of Incident response, a team […]
Streamline Automation with Outbound Webhooks for AWS Systems Manager Runbooks
Automation runbooks let you define a set of actions that automate various operations in your AWS environment. Runbooks allow our customers to simply configure automation workflows that they can execute based on either events or a scheduled cadence. These workflows commonly require integration with third-party systems, such as Slack, Jira, and ServiceNow. As of January […]