AWS Open Source Blog

Announcing HAQM CloudWatch for Ray

HAQM CloudWatch is now available for Ray on HAQM Elastic Compute Cloud (HAQM EC2). Ray is an open source (Apache 2.0 License) framework to build and scale distributed applications. CloudWatch is a monitoring and observability service that provides data and actionable insights to monitor your applications, respond to system-wide performance changes, and optimize resource utilization. With CloudWatch for Ray, you can now deploy your Ray applications in production on HAQM EC2 and monitor their health with near real-time metrics, logs, and alarms.

Release highlights also include support for extended HAQM EC2 metrics, Ray metrics, and Ray logs integrated with CloudWatch. HAQM EC2 extended metrics that can be monitored on CloudWatch include critical insights into your application health such as memory utilization, disk utilization, and running process count. Ray metrics available on CloudWatch include both high-level aggregates at the cluster level and low-level insights at the individual HAQM EC2 instance level. These metrics are automatically integrated into default Ray application dashboards to give you rapid, configurable insights into your overall application’s health so you can quickly identify high-level trends in your Ray clusters, and gain detailed insights into the health of a single HAQM EC2 instance. CloudWatch logs for your Ray applications provide detailed insights into your application’s health and provide a durable history of events that are critical for troubleshooting problems in high-availability, production environments.

Screenshot of HAQM CloudWatch dashboard for Ray applications

Figure 1: Sample HAQM CloudWatch dashboard for Ray applications

Getting Started

Learn more about this integration, and start running your Ray applications on AWS by referring to the setup and usage guide in Ray docs. If you have questions about the integration or run into issues, please file an issue.

Daniel Yeo

Daniel Yeo

Daniel Yeo is a Senior Technical Program Manager at HAQM. He is passionate about advancing technologies to make machine learning scale seamlessly. His team is actively contributing improvements and novel ideas to Ray in Open Source, so customers can reap the full potential of using Ray.

Yiqin(Miranda) Zhu

Yiqin(Miranda) Zhu

Miranda is a Software Development Engineer in the Ray team at HAQM. She is passionate about developing Open Source Ray and integrating Ray with HAQM technologies.