AWS Cloud Operations Blog

Lowering MTTR with HAQM CloudWatch and AWS X-Ray

Lowering MTTR with HAQM CloudWatch and AWS X-Ray

Customers running microservice-based workloads in a serverless environment frequently have issues with troubleshooting incidents as the data they need can be distributed across hundreds or thousands of components. In this blog post, I will demonstrate how you can reduce the mean time to resolution (MTTR, or the average time it takes to repair or mitigate […]

Unlocking the power: The keys to delivering successful Cloud Migrations

Despite the many benefits of moving to the Cloud, large enterprises frequently struggle to deliver migrations (and the related business transformation) in the planned timeframe. Why?  What are the key factors that ensure a successful migration that becomes an oft-quoted industry benchmark for a Cloud driven transformation; rather than a moribund initiative where a number […]

Self-service Account Provisioning Using AWS Service Management Connector for ServiceNow

Many customers are looking to adopt a multi-account strategy within their AWS environment. This allows customers to isolate their workloads into different environments including test, dev, and production in addition to separating workloads based on regulatory requirements. As customers scale their multi-account environments, one strategy to increase agility is to offer business units their own […]

Best practices for managing AWS account meta-data at scale

Best practices for managing AWS account meta-data at scale

As we all know, using multiple accounts on your AWS environment is one of the recommended best practices when organizing your workloads and your environment. Using multiple accounts brings multiple benefits allowing you to better leverage AWS services. However, AWS accounts are additional resources that you need to manage. In this blog post, you will […]

How to download your AWS Resilience Hub assessment results

AWS Resilience Hub provides a central place to define, validate, and track the resilience of your application on AWS. It can help in assessing impact of every application change on resiliency by automatically running the assessment on a daily basis or as part of CI/CD pipeline. With AWS Resilience Hub, you can easily create resiliency […]

Using Tag-Based Filtering to Manage AWS Health Monitoring and Alerting at Scale

AWS provides customers regular updates of service notifications and planned activities via e-mail to the root account owners or the operational, security and billing contacts. AWS also provides granular notifications to customers via AWS Health allowing them to fine-tune their alerts on issues relating directly to them. Alongside Health Dashboard’s monitoring capabilities, customers can also […]

Designing a successful cloud migration: top five pitfalls and how to avoid a stall

Stalled cloud migrations can undermine cloud adoption’s business value. It is therefore important to watch out for early warning signs and take timely corrective action. This blog post looks at five big pitfalls every cloud migration leader should be aware of. The good news is you can spot these issues early and mitigate them to […]

Observe dynamic sites with HAQM CloudWatch Synthetics and AWS Systems Manager Parameter Store

Observe dynamic sites with HAQM CloudWatch Synthetics and AWS Systems Manager Parameter Store

Overview Maintaining and improving end user experience is key and as your business grows, the number of endpoints you need to observe can grow quickly. It can become more challenging and time consuming to build multiple canaries to observe them. This solution is designed to show how you can use a consistent and automated approach […]

Centralize image administration for virtual machines and containers using EC2 Image Builder

Customers may have different processes for image building across virtual machines, containers, or both. This variation in processes introduces operational overhead in managing images, including the initial configuration and the ongoing updates. From the AWS Well-Architected Operational Excellence Pillar, section “Document and share lessons learned”, these images should be standardized, configured with the latest patches, […]

Observability using native HAQM CloudWatch and AWS X-Ray for serverless modern applications

Introduction In this blog post, we will share how you can use AWS-native observability tools to measure the current state of your modern serverless applications and how to get started with the minimal effort. We will review tools like HAQM CloudWatch and AWS X-Ray and explore how these services can help you instrument your application […]