AWS Cloud Operations Blog

Category: AWS X-Ray

Alarm Context Tool Architecture Diagram

Respond to CloudWatch Alarms with HAQM Bedrock Insights

Overview When operating complex, distributed systems in the cloud, quickly identifying the root cause of issues and resolving incidents can be a daunting task. Troubleshooting often involves sifting through metrics, logs, and traces from multiple AWS services, making it challenging to gain a comprehensive understanding of the problem. So how can you streamline this process […]

Using the unified CloudWatch Agent to send traces to AWS X-Ray

Today, applications are more distributed than ever before and they no longer run in isolation. This is especially the case when utilizing  HAQM Elastic Container Service (HAQM ECS) or HAQM Elastic Kubernetes Service (HAQM EKS). A distributed workload or system is one that encompasses multiple small independent components, all working together to complete a task or job. […]

Analyze AWS Microservices architecture to identify and address performance issues

HAQM Payment Services (APS) is a payment service provider in the Middle East and North Africa. With its secure and seamless payment experience, it empowers businesses to build their online presence. HAQM Payment Services is based on a broad and complex microservice based architecture that are dependent on multiple AWS services, including HAQM Elastic Compute […]

Lowering MTTR with HAQM CloudWatch and AWS X-Ray

Lowering MTTR with HAQM CloudWatch and AWS X-Ray

Customers running microservice-based workloads in a serverless environment frequently have issues with troubleshooting incidents as the data they need can be distributed across hundreds or thousands of components. In this blog post, I will demonstrate how you can reduce the mean time to resolution (MTTR, or the average time it takes to repair or mitigate […]

Observability using native HAQM CloudWatch and AWS X-Ray for serverless modern applications

Introduction In this blog post, we will share how you can use AWS-native observability tools to measure the current state of your modern serverless applications and how to get started with the minimal effort. We will review tools like HAQM CloudWatch and AWS X-Ray and explore how these services can help you instrument your application […]

Announcing AWS CDK Observability Accelerator for HAQM EKS

Today we are happy to announce the all-new AWS CDK Observability Accelerator – a set of opinionated modules to help you set up observability for your AWS environments with AWS Native services and AWS-managed observability services such as HAQM Managed Service for Prometheus, HAQM Managed Grafana, AWS Distro for OpenTelemetry (ADOT) and HAQM CloudWatch. AWS […]

How Audible used HAQM CloudWatch cross-account observability to resolve severity tickets faster

This blog was co-written with Audible’s Apurva Jatakia, Kaushik S., and David Etler. Audible’s consumption services platform serves thousands of requests every second, and each incoming request is served by a distributed set of microservices owned by different teams. An Audible team, in charge of a platform called Stagg, is responsible for five separate microservices. […]

How CloudWatch cross-account observability helps JPMorgan Chase improve Federated Data Lake Monitoring

AWS best practices guide customers to deploy their applications across multiple AWS accounts to establish security and billing boundary between teams and to reduce the impact of operational events. As enterprises grow and scale with tons of resources, customers often need a unified observability experience to help them search, visualize, and analyze their cross-account telemetry […]

How to develop an Observability strategy – Part 2

Your observability strategy starts with your business. “Observability” describes how well you can understand what’s happening in a system. Developing an observability strategy isn’t a one-time effort. It’s a continuous improvement effort that occurs throughout the lifecycle of your workloads. It enables your teams to determine whether or not the workloads they design and run […]

How to monitor hybrid environments with AWS services

As enterprises start migrating to the cloud, one challenge they will face is framing and implementing a holistic monitoring strategy for the hybrid environment. In our experience, there are three main reasons for this. First and foremost, an enterprise generally has multiple monitoring tools in place, but when the enterprises start moving to the cloud, […]