AWS Cloud Operations Blog

Build a Cloud Automation Practice for Operational Excellence: Best Practices from AWS Managed Services

Introduction In today’s fast-paced business environment, organizations are actively pursuing operational excellence to maintain a competitive edge. Automation is a critical foundation for achieving better efficiency, reliability, and scalability in operations. However, integrating automation into cloud practice entails more than simply implementing software or tools. Building a cloud automation practice requires a transformative journey that […]

Know Before You Go – AWS re:Invent 2023 Cloud Governance and Compliance

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2023, in Las Vegas from Nov 27 to Dec 1. Whether you’re a seasoned re:Invent veteran or a first-timer, the excitement and opportunities of AWS re:Invent never cease to amaze. With a total of 96 sessions covering the solution areas that […]

Monitoring GPU workloads on HAQM EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. HAQM Elastic Compute Cloud (HAQM EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Announcing HAQM CloudWatch Container Insights with Enhanced Observability for HAQM EKS on EC2

Announcing HAQM CloudWatch Container Insights with Enhanced Observability for HAQM EKS on EC2

HAQM CloudWatch Container Insights is a fully managed monitoring and observability service that provides DevOps engineers, developers, SREs, and IT managers with out-of-the-box visibility into their containerized applications and microservice environments. With HAQM CloudWatch Container Insights, you can monitor, isolate, and diagnose issues in your Kubernetes clusters with minimal effort. It delivers infrastructure telemetry like […]

Know Before You Go — AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

Know Before You Go – AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2023 in Las Vegas from Nov 27 to Dec 1. Whether you’re a seasoned re:Invent veteran or a first-timer, the excitement and opportunities of AWS re:Invent never cease to amaze. With a total of 96 sessions covering the solution areas that […]

How to email your HAQM CloudWatch dashboard

How to email your HAQM CloudWatch dashboard

HAQM CloudWatch enables customers to collect monitoring and operational data in the form of logs, metrics, alarms, and events, thereby allowing easy workload visualization and notifications. Many customers use HAQM CloudWatch  dashboards to monitor applications and infrastructure insights in order to have a unified dashboard for monitoring. Traditionally, operational health data access was only viewable for […]

Automating HAQM EC2 Auto Scaling with HAQM CloudWatch custom metrics and AWS CDK

Automating HAQM EC2 Auto Scaling with HAQM CloudWatch custom metrics and AWS CDK

Introduction As customers migrate legacy workloads to AWS Cloud, they may need to rehost or replatform applications to HAQM EC2 servers. To benefit from the scalability of cloud, customers need to be able to scale these EC2 servers up or down, on demand and on schedule. HAQM EC2 Auto Scaling Groups provide the on-demand scaling […]

HAQM Connect real-time monitoring using HAQM Managed Grafana and HAQM Timestream

HAQM Connect is an easy-to-use cloud contact center solution that helps companies of any size deliver superior customer service at a lower cost. Connect has many real-time monitoring capabilities. For requirements that go beyond those supported out of the box, HAQM Connect also provides you with data and APIs you can use to implement your […]