AWS Cloud Operations Blog

Category: Monitoring and observability

How Hapag-Lloyd automated incident management using AWS Step Functions

This post is co-authored by Grzegorz Kaczor and Daniel Steenbock from Hapag-Lloyd AG and Michael Graumann and Daniel Moser from AWS. Introduction In today’s fast-paced digital landscape, efficient incident management is crucial for maintaining high-quality customer experiences. In our previous article we discussed how the Web and Mobile department at Hapag-Lloyd established observability for serverless […]

HEIDI-Q Featured Image

Get Operational Insights Fast with AWS Health and HAQM Q

For organizations with multiple AWS accounts, staying on top of planned AWS service changes and events is critical to keep operations and business running smoothly. Organizations use AWS Health for ongoing visibility into resource performance and the availability of AWS services and accounts, but the volume of notifications from AWS Health can sometimes be overwhelming. […]

How to detect and monitor HAQM Simple Storage Service (S3) access with AWS CloudTrail and HAQM CloudWatch

How to detect and monitor HAQM Simple Storage Service (S3) access with AWS CloudTrail and HAQM CloudWatch

While protection of data is critical, equally important is observing who accesses it.  AWS services allow you to control your data by determining where it’s stored, who has access, and how it’s secured. AWS CloudTrail provides an effective way to track data access activities.  You can detect access attempts, and identify potential unauthorized attempts. CloudTrail, […]

Detect and respond to security threats in near real-time using HAQM Managed Grafana

Security is “job zero” at AWS. It’s crucial to gain deeper insights into your AWS infrastructure’s security posture to respond quickly to threats. The ability to centrally monitor and visualize the security findings make it easier for you to identify any security threats or gaps and also keep the principle of least privilege in focus. […]

Know Before You Go – AWS re:Invent 2024 Monitoring and Observability

Planning to join us in Las Vegas from Dec 2 to Dec 6 at AWS re:Invent 2024 and looking to learn more about monitoring and observability? If you are, this blog highlights Cloud Operations sessions that focus on monitoring and observability at re:Invent 2024! Monitoring and Observability allows you to understand the health of your applications and […]

How Cigna Implemented a Multi-Region Centralized Alerting System on AWS

This post is co-written with Nicolas Trettel, Cloud Engineering Senior Advisor at Cigna. Monitoring applications and alerting on issues is crucial for building resilient systems. HAQM CloudWatch is a service that monitors applications, responds to performance changes, optimizes resource use, and provides insights into operational health. By collecting data across AWS resources, CloudWatch gives visibility […]

How Stripe architected massive scale observability solution on AWS

This post is co-written with Cody Rioux, Staff Engineer at Stripe and Michael Cowgill, Staff engineer at Stripe Stripe powers online and in-person payment processing and provides financial solutions for businesses of all sizes. Stripe operates a sophisticated microservice environment built on top of AWS. In this blog post we will cover the journey and […]

Sign-in to AWS Console Mobile Application with an AWS Access Portal or third-party IdP URL

AWS customers rely on the AWS Console Mobile Application to monitor, manage, and receive notifications to stay informed about their AWS resources while away from their desktop devices. Customers who use Single-Sign-On (SSO) can face a unique set of challenges while signing into the AWS Console Mobile Application. While SSO can offer enhanced security and […]

Title image that says managing access to AWS accounts from Microsoft Teams and Slack at scale using AWS Organizations and AWS Chatbot

Managing access to AWS accounts from Microsoft Teams and Slack at scale using AWS Organizations and AWS Chatbot

Customers use chat collaboration applications like Microsoft Teams and Slack to collaborate and manage their AWS applications. AWS Chatbot is a ChatOps service that enables customers to monitor, troubleshoot issues, and manage AWS applications from chat channels. AWS Chatbot provides autonomy and customizability to DevOps teams operating their AWS environments on the go from chat […]

Automating metrics collection on HAQM EKS with HAQM Managed Service for Prometheus managed scrapers

Managing and operating monitoring systems for containerized applications can be a significant operational burden for customers such as metrics collection. As container environments scale, customers have to split metric collection across multiple collectors, right-size the collectors to handle peak loads, and continuously manage, patch, secure, and operationalize these collectors. This overhead can detract from an […]