AWS Machine Learning Blog

Category: HAQM Managed Service for Prometheus

Open source observability for AWS Inferentia nodes within HAQM EKS clusters

This post walks you through the Open Source Observability pattern for AWS Inferentia, which shows you how to monitor the performance of ML chips, used in an HAQM Elastic Kubernetes Service (HAQM EKS) cluster, with data plane nodes based on HAQM Elastic Compute Cloud (HAQM EC2) instances of type Inf1 and Inf2.