AWS HPC Blog
Category: Compute
Building deep learning models for geoscience using MATLAB and NVIDIA GPUs on HAQM EC2 (Part 1 of 2)
In this blog post, we discuss how geoscientists can use shallow RNN-based algorithms with MATLAB to automatically recognize distinct geologic features in seismic images. We discuss the workflow for developing the AI models using MATLAB for seismic interpretation. In a second post will introduce the various compute resources leveraged from AWS and NVIDIA for developing the models.
Second generation EFA: improving HPC and ML application performance in the cloud
Since launch, EFA has seen continuous improvements in performance. In this post, we talk about our 2nd generation of EFA, which takes another step in improving Machine Learning and High Performance Computing in the Cloud.
Launch self-supervised training jobs in the cloud with AWS ParallelCluster
In this post we describe the process to launch large, self-supervised training jobs using AWS ParallelCluster and Facebook’s Vision Self-Supervised Learning (VISSL) library.
Avoid overspending with AWS Batch using a serverless cost guardian monitoring architecture
Pay-as-you-go resources are a compelling but budget-limited researchers performing HPC workloads need help working within the bounds of their grants. In this post, we show how to build a real-time cost guardian for AWS Batch to help enforce those limits.
Support for Instance Allocation Flexibility in AWS ParallelCluster 3.3
AWS ParallelCluster 3.3.0 now lets you define a list of HAQM EC2 instance types for resourcing a compute queue. This gives you more flexibility to optimize the cost and total time to solution of your HPC jobs, especially when capacity is limited or you’re using Spot Instances.
How AWS Batch developed support for HAQM Elastic Kubernetes Service
Today, we discuss AWS batch on HAQM EKS, and the initial motivation and design choices the team made when we developed the service, and some of the challenges to overcome.
Minimize HPC compute costs with all-or-nothing instance launching
In this post, we highlight a little-known configuration option for Slurm on @awscloud ParallelCluster that can reduce costs and increase your iteration speed by preventing idle batch instances from launching when EC2 capacity is limited.
BioContainers are now available in HAQM ECR Public Gallery
Today we are excited to announce that all 9000+ applications provided by the BioContainers community are available within ECR Public Gallery! You don’t need an AWS account to access these images, but having one allows many more pulls to the internet, and unmetered usage within AWS. If you perform any sort of bioinformatics analysis on AWS, you should check it out!
Optimize Protein Folding Costs with OpenFold on AWS Batch
In this post, we describe how to orchestrate protein folding jobs on AWS Batch. We also compare the performance of OpenFold and AlphaFold on a set of public targets. Finally, we will discuss how to optimize your protein folding costs.
Rearchitecting AWS Batch managed services to leverage AWS Fargate
AWS service teams continuously improve the underlying infrastructure and operations of managed services, and AWS Batch is no exception. The AWS Batch team recently moved most of their job scheduler fleet to a serverless infrastructure model leveraging AWS Fargate. I had a chance to sit with Devendra Chavan, Senior Software Development Engineer on the AWS Batch team, to discuss the move to AWS Fargate and its impact on the Batch managed scheduler service component.