AWS Cloud Operations Blog
Category: Monitoring and observability
Real User Monitoring with HAQM CloudWatch RUM and HAQM Managed Grafana
In today’s fast-paced digital world, users expect fast and reliable web experiences. Slow-loading pages, errors, and other performance issues can lead to lower engagement and conversion rates, ultimately hurting a business’s bottom line. That’s where Real User Monitoring (RUM) comes in. Real User Monitoring (RUM) is a crucial aspect of modern web application development, allowing developers and […]
VTEX scales to 150 million metrics using HAQM Managed Service for Prometheus
VTEX is a multi-tenant platform with a distributed engineering operation. Observing hundreds of services in real time in an efficient manner is a technical challenge for the business. In this blog, we will show how VTEX created a resilient open source-based architecture aligned with a sharding strategy, using HAQM Managed Service for Prometheus (AMP) to […]
Automating HAQM EC2 Instances Monitoring with Prometheus EC2 Service Discovery and AWS Distro for OpenTelemetry
Traditionally, scraping application Prometheus metrics required manual updates to a configuration file, posing challenges in dynamic AWS environments where HAQM EC2 instances are frequently created or terminated. This not only proves time consuming but also introduces the risk of configuration errors, lacking the agility necessary in dynamic environments. In this blog post, we will demonstrate […]
Monitor your AWS resources on your mobile device with AWS Console Mobile Application
AWS customers are increasingly relying on AWS User Notifications to monitor and get real-time notifications about the AWS resources that are most important to them. The AWS Console Mobile Application can be configured as a notification delivery channel, where users can monitor AWS resources, get detailed resource notifications, diagnose issues, and take remedial actions, from […]
Accelerate troubleshooting with structured logs in HAQM CloudWatch
Troubleshooting often involves complex analysis across fragmented telemetry data. While alarms on metrics can signal high-level deviations, deeper context often resides in other areas such as log messages, which help uncover the root cause. This disjointed approach not only consumes time and effort, but also inflates telemetry costs. In this post, we’ll showcase how structured […]
Monitoring Windows services with HAQM CloudWatch
If you run Windows workloads on HAQM Elastic Compute Cloud (HAQM EC2), monitoring the health and performance of your Windows Services is essential for reliable systems administration. It’s not just about ensuring uptime; it’s about having a pulse on your system’s health and performance. With a variety of services operating in the background, each playing […]
How Unitary achieved automatic metric collection with HAQM Managed Service for Prometheus collector
This post was co-authored with Nicolas Fournier, Platform Engineer at Unitary. Every day, over 80 years’ worth of video content is uploaded online. Some of this content can also be harmful. Unitary knows that human moderators are the current gold standard for moderation, but this manual approach does not scale. While automated systems can scale, […]
Multi-tenant monitoring across accounts and regions using HAQM Managed Service for Prometheus
In this guest blog post, Nauman Noor (Managing Director), Fabio Dias (Cloud Developer), and Dylan Alibay (Cloud Developer) from the platform engineering team at State Street discuss their use of HAQM Managed Prometheus and AWS Distro for OpenTelemetry to enable monitoring in a multi-tenant, multi-account, and multi-region environment. In the ever-evolving financial services landscape, State […]
Introducing HAQM CloudWatch Alarm Recommendations
HAQM CloudWatch is a foundational AWS service that provides you with actionable insights into your cloud resources and applications. With HAQM CloudWatch Metrics, you can gain better visibility into your infrastructure and large-scale application performance. You can set up alarms using HAQM CloudWatch Alarms for metrics emitted by AWS services or your applications. Identifying which metrics […]
How to monitor application health using SLOs with HAQM CloudWatch Application Signals
Today, customers operate tens, hundreds, or even thousands of applications arranged in complex distributed systems composed of many interdependent services. These applications need to be continuously available and performant to maintain end-user satisfaction and business growth. HAQM CloudWatch Application Signals (now in Preview) makes it easy to automatically instrument and operate applications on AWS to […]