AWS HPC Blog

Category: Artificial Intelligence

Scale Reinforcement Learning with AWS Batch Multi-Node Parallel Jobs

Autonomous robots are increasingly used across industries, from warehouses to space exploration. While developing these robots requires complex simulation and reinforcement learning (RL), setting up training environments can be challenging and time-consuming. AWS Batch multi-node parallel (MNP) infrastructure, combined with NVIDIA Isaac Lab, offers a solution by providing scalable, cost-effective robot training capabilities for sophisticated behaviors and complex tasks.

Enhancing Equity Strategy Backtesting with Synthetic Data: An Agent-Based Model Approach

Developing robust investment strategies requires thorough testing, but relying solely on historical data can introduce biases and limit your insights. Learn how synthetic data from agent-based models can provide an unbiased testbed to systematically evaluate your strategies and prepare for future market scenarios. Part 1 of 2 covers the theoretical foundations of the approach.

Deploying Generative AI Applications with NVIDIA NIM Microservices on HAQM Elastic Kubernetes Service (HAQM EKS) – Part 2

Learn how to deploy AI models at scale with @AWS using NVIDIA’s NIM and HAQM EKS! This step-by-step guide shows you how to create a GPU cluster for inference in this second post of a two-part series!

Large scale training with NeMo Megatron on AWS ParallelCluster using P5 instances

Large scale training with NVIDIA NeMo Megatron on AWS ParallelCluster using P5 instances

Launching distributed GPT training? See how AWS ParallelCluster sets up a fast shared filesystem, SSH keys, host files, and more between nodes. Our guide has the details for creating a Slurm-managed cluster to train NeMo Megatron at scale.

Enhancing ML workflows with AWS ParallelCluster and HAQM EC2 Capacity Blocks for ML

Enhancing ML workflows with AWS ParallelCluster and HAQM EC2 Capacity Blocks for ML

No more guessing if GPU capacity will be available when you launch ML jobs! EC2 Capacity Blocks for ML let you lock in GPU reservations so you can start tasks on time. Learn how to integrate Caacity Blocks into AWS ParallelCluster to optimize your workflow in our latest technical blog post.