Containers

Tag: observability

Part 1: Introduction to observing machine learning workloads on HAQM EKS

This post was jointly authored by Elamaran Shanmugam (Senior Partner Specialist SA), Sanjeev Ganjihal (Senior Specialist SA), and Steven David (Principal SA). Introduction In this first part of a four-part series, titled Observability of MLOps on HAQM EKS, you get an overview of Machine Learning operations(MLOps) on HAQM Elastic Kubernetes Service(HAQM EKS). This includes understanding […]

Monitoring and automating recovery from AZ impairments in HAQM EKS with Istio and ARC Zonal Shift

Introduction Running microservice-style architectures in the cloud can quickly become a complex operation. Teams must account for a growing number of moving pieces, such as multiple instances of independent workloads, along with their infrastructure dependencies. These components can then be distributed across different topology domains, such as multiple HAQM Elastic Compute Cloud (HAQM EC2) instances, […]

Monitoring Windows pods with Prometheus and Grafana

This post was co-authored by Cezar Guimarães, Sr. Software Engineer, VTEX Introduction Customers across the globe are increasingly adopting HAQM Elastic Kubernetes Service (HAQM EKS) to run their Windows workloads. This is a result of customers figuring out that refactoring existing Windows-based applications into an open-source environment, while ideal, is a very complex task. It […]

Empowering Kubernetes Observability with eBPF on HAQM EKS

Post co-written by Shahar Azulay, CEO and Co-Founder at GroundCover Introduction The abstraction introduced by Kubernetes allows teams to easily run applications at varying scale without worrying about resource allocation, autoscaling, or self-healing. However, abstraction isn’t without cost and adds complexity and difficulty tracking down the root cause of problems that Kubernetes users experience. To […]

Multi-cluster cost monitoring for HAQM EKS using Kubecost and HAQM Managed Service for Prometheus

Introduction HAQM Managed Service for Prometheus is a Prometheus-compatible service that monitors and provides alerts on containerized applications and infrastructure at scale. In the previous post, Integrating Kubecost with HAQM Managed Service for Prometheus, we discussed how you can integrate Kubecost with HAQM Managed Service for Prometheus (AMP) to get granular visibility into your HAQM […]

Observability for AWS App Runner VPC networking

With AWS App Runner, you can quickly deploy web applications and APIs at any scale. You can start with your source code or a container image, and App Runner will fully manage all infrastructure, including servers, networking, and load balancing for your application. If you want, App Runner can also configure a deployment pipeline for […]

Diagram of App Runner service showing how OpenTelemetry SDK hands requests

Tracing an AWS App Runner service using AWS X-Ray with OpenTelemetry

Introduction AWS App Runner is a fully managed service that developers can use to quickly deploy containerized web applications and APIs at scale with little to no infrastructure experience. You can start with source code or a container image. App Runner will fully manage all infrastructure, including servers, networking, and load balancing, for your application. App […]

Fluent Bit Integration in CloudWatch Container Insights for EKS

Ugur KIRA, Dejun Hu, TP Kohli CloudWatch Container Insights CloudWatch Container Insights enables you to explore, analyze, and visualize your container metrics, Prometheus metrics, application logs, and performance log events through automated dashboards in the CloudWatch console. These dashboards summarize the performance and availability of clusters, nodes or EC2 instances, services, tasks, pods, and containers […]