AWS Cloud Operations Blog
Category: HAQM Elastic Kubernetes Service
Monitor EBS Detailed Performance Statistics with HAQM Managed Service for Prometheus
Today we are excited to announce that you can now easily ingest HAQM EBS detailed performance statistics from your HAQM Elastic Kubernetes Service (HAQM EKS) workloads into an HAQM Managed Service for Prometheus workspace. We recently announced the availability of EBS detailed performance statistics, which gives you real-time visibility into the performance of your EBS […]
Getting insights from HAQM Managed Service for Prometheus using natural language powered by HAQM Bedrock
As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]
Enhancing observability with a managed monitoring solution for HAQM EKS
Introduction Keeping a watchful eye on your Kubernetes infrastructure is crucial for ensuring optimal performance, identifying bottlenecks, and troubleshooting issues promptly. In the ever-evolving world of cloud-native applications, HAQM Elastic Kubernetes Service (EKS) has emerged as a popular choice for deploying and managing containerized workloads. However, monitoring Kubernetes clusters can be challenging due to their […]
How to automate application log ingestion from HAQM EKS on Fargate into AWS CloudTrail Lake
Customers often look for options to capture and centralized storage of application logs from HAQM Elastic Kubernetes Service on Fargate (HAQM EKS on Fargate) Pods to investigate root causes or analyze security incidents. Customers also like the capability to easily query the logs to assist with security investigations. In this blog post, we show you […]
How StormForge reduces complexity and ensures scalability with HAQM Managed Service for Prometheus
This blog post was co-written by Brent Eager, Senior Software Engineer, StormForge StormForge is the creator of Optimize Live, a Kubernetes vertical rightsizing solution that is compatible with the Kubernetes HorizontalPodAutoscaler (HPA). Using cluster-based agents, machine learning, and HAQM Managed Service for Prometheus, Optimize Live is able to continuously calculate and apply optimal resource requests, […]
Unlocking Insights: Turning Application Logs into Actionable Metrics
Modern software development teams understand the importance of observability as a critical aspect of building reliable and resilient applications. By implementing observability practices, software teams can proactively identify issues, uncover performance bottlenecks, and enhance system reliability. However, it is a fairly recent trend and still lacks industry-wide adoption. As organizations standardize on containers, they often […]
Announcing HAQM CloudWatch Container Insights for HAQM EKS Windows Workloads Monitoring
Monitoring containerized applications requires precision and efficiency. As your applications scale, collecting and summarizing application and infrastructure metrics from your applications can be challenging. One way to handle this challenge is using HAQM CloudWatch Container Insights which is a single-click native monitoring tool provided by AWS. HAQM CloudWatch Container Insights helps customers collect, aggregate, and summarize […]
Enhance Kubernetes Operational Visibility with AWS Chatbot
Many customers run their mission critical container workloads on HAQM Web Services (AWS) using HAQM Elastic Kubernetes Service (HAQM EKS). One of the key focus areas for them is to analyze and act on operational events quickly. Getting real-time visibility into performance issues, traffic spikes and infrastructure events can enable teams to quickly address issues and […]
Monitoring GPU workloads on HAQM EKS using AWS managed open-source services
As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. HAQM Elastic Compute Cloud (HAQM EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]
Announcing HAQM CloudWatch Container Insights with Enhanced Observability for HAQM EKS on EC2
HAQM CloudWatch Container Insights is a fully managed monitoring and observability service that provides DevOps engineers, developers, SREs, and IT managers with out-of-the-box visibility into their containerized applications and microservice environments. With HAQM CloudWatch Container Insights, you can monitor, isolate, and diagnose issues in your Kubernetes clusters with minimal effort. It delivers infrastructure telemetry like […]