AWS Cloud Operations Blog
Category: Monitoring and observability
Get Disk Utilization of Your Fleet Using AWS Systems Manager Custom Inventory Types
Some of my customers need assistance while operating their HAQM Elastic Compute Cloud (HAQM EC2) infrastructure. They need to: Review the disk usage of various volumes/ disks within an EC2 instance. To do it in a scalable way, one does not need to access the instance either through a Remote Desktop Session (RDP) or use […]
Automate CloudWatch Dashboard creation for your AWS Elemental Mediapackage and AWS Elemental Medialive
Introduction Monitoring the health and performance of your media services is critical to ensuring a seamless viewing experience for your customers. HAQM CloudWatch provides powerful monitoring capabilities for HAQM Web Services (AWS) resources. Setting up comprehensive dashboards can be a time-consuming process, especially for organizations managing large number of resources across multiple regions. The Automatic CloudWatch […]
Improve application reliability with effective SLOs
At AWS, we consider reliability as a capability of services to withstand major disruptions within acceptable degradation parameters and to recover within an acceptable timeframe. Service reliability goes beyond traditional disciplines, such as availability and performance, to achieve its goal. Components of a system or application will eventually fail over time. Like our CTO Werner Vogels […]
Respond to CloudWatch Alarms with HAQM Bedrock Insights
Overview When operating complex, distributed systems in the cloud, quickly identifying the root cause of issues and resolving incidents can be a daunting task. Troubleshooting often involves sifting through metrics, logs, and traces from multiple AWS services, making it challenging to gain a comprehensive understanding of the problem. So how can you streamline this process […]
Troubleshooting AWS Glue ETL Jobs using HAQM CloudWatch Logs Insights enhanced queries
Introduction In the realm of data integration and ETL (Extract, Transform, Load) processes, organizations often face challenges in ensuring efficiency and performance of the ETL jobs. Monitoring the efficiency of ETL jobs becomes crucial in maintaining seamless data workflows. This is where HAQM CloudWatch Logs Insights comes into play, offering powerful log analytics to unearth […]
Testing and debugging HAQM CloudWatch Synthetics canary locally
Introduction HAQM CloudWatch Synthetics canaries are scripts that monitor your endpoints and APIs by simulating the actions of a user. These canaries run on a schedule, check the availability and latency of your applications, and alert you when there are issues. Canary scripts are written in Node.js and Python, and they run inside an AWS […]
Monitor Python apps with HAQM CloudWatch Application Signals (Preview)
AWS announced HAQM CloudWatch Application Signals during re:Invent 2023. It is a new feature to monitor and understand the health of Java applications. Today we are excited to announce that Application Signals now supports Python applications. Enabling Application Signals allows you to use AWS Distro for OpenTelemetry (ADOT) to instrument Python applications without code changes. […]
How to monitor AWS WAF logging centrally using HAQM Managed Grafana
It is important for cloud security operations teams to maintain a high level of cloud security and detect and respond to malicious web activity in near real-time. AWS WAF helps protect web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources. However, as your cloud environment scales with […]
Unlocking Insights: Turning Application Logs into Actionable Metrics
Modern software development teams understand the importance of observability as a critical aspect of building reliable and resilient applications. By implementing observability practices, software teams can proactively identify issues, uncover performance bottlenecks, and enhance system reliability. However, it is a fairly recent trend and still lacks industry-wide adoption. As organizations standardize on containers, they often […]
Announcing HAQM CloudWatch Container Insights for HAQM EKS Windows Workloads Monitoring
Monitoring containerized applications requires precision and efficiency. As your applications scale, collecting and summarizing application and infrastructure metrics from your applications can be challenging. One way to handle this challenge is using HAQM CloudWatch Container Insights which is a single-click native monitoring tool provided by AWS. HAQM CloudWatch Container Insights helps customers collect, aggregate, and summarize […]