AWS HPC Blog
Category: Compute
Easing your migration from SGE to Slurm in AWS ParallelCluster 3
This post will help you understand the tools available to ease the stress of migrating your cluster (and your users) from SGE to Slurm, which is necessary since the HPC community is no longer supporting SGE’s open-source codebase.
Simulating 44-Qubit quantum circuits using AWS ParallelCluster
A key part of the development of quantum hardware and quantum algorithms is simulation using existing classical architectures and HPC techniques. In this blog post, we describe how to perform large-scale quantum circuits simulations using AWS ParallelCluster with QuEST, the Quantum Exact Simulation Toolkit. We demonstrate a simple and rapid deployment of computational resources up to 4,096 compute instances to simulate random quantum circuits with up to 44 qubits. We were able to allocate as many as 4096 EC2 instances of c5.18xlarge to simulate a non-trivial 44 qubit quantum circuit in fewer than 3.5 hours.
Running large-scale CFD fire simulations on AWS for HAQM.com
In this blog post, we discuss the AWS solution that HAQM’s construction division used to conduct large-scale CFD fire simulations as part of their Fire Strategy solutions to demonstrate safety and fire mitigation strategies. We outline the five key steps taken that resulted in simulation times that were 15-20x faster than previous on-premises architectures, reducing the time to complete from up to twenty-one days to less than one day.
Expanded filesystems support in AWS ParallelCluster 3.2
AWS ParallelCluster version 3.2 introduces support for two new HAQM FSx filesystem types (NetApp ONTAP and OpenZFS). It also lifts the limit on the number of filesystem mounts you can have on your cluster. We’ll show you how, and help you with the details for getting this going right away.
Slurm-based memory-aware scheduling in AWS ParallelCluster 3.2
AWS ParallelCluster version 3.2 now supports memory-aware scheduling in Slurm to give you control over the placement of jobs with specific memory requirements. In this blog post, we’ll show you how it works, and explain why this will be really useful to people with memory-hungry workloads.
Analyzing Genomic Data using HAQM Genomics CLI and HAQM SageMaker
In this blog post, we demonstrate how to leverage the AWS Genomics Command line and HAQM SageMaker to analyze large-scale exome sequences and derive meaningful insights. We use the bioinformatics workflow manager Nextflow, it’s open source library of pipelines, NF-Core, and AWS Batch.
How Thermo Fisher Scientific Accelerated Cryo-EM using AWS ParallelCluster
In this blog post, we’ll walk you through the process of building a successful Cryo-EM benchmarking pilot using AWS ParallelCluster, HAQM FSx for Lustre, and cryoSPARC (from Structura Biotechnology) and explain some of our design decisions along the way.
Efficient and cost-effective rendering pipelines with Blender and AWS Batch
This blog post explains how to run parallel rendering workloads and produce an animation in a cost and time effective way using AWS Batch and AWS Step Functions. AWS Batch manages the rendering jobs on HAQM Elastic Compute Cloud (HAQM EC2), and AWS Step Functions coordinates the dependencies across the individual steps of the rendering workflow. Additionally, HAQM EC2 Spot instances can be used to reduce compute costs by up to 90% compared to On-Demand prices.
Getting Started with NVIDIA Clara Parabricks on AWS Batch using AWS CloudFormation
In this blog post, we’ll show how you can run NVIDIA Parabricks on AWS Batch leveraging AWS CloudFormation templates. Parabricks is a GPU-accelerated tool for secondary genomic analysis. It reduces the runtime of variant calling on a 30x human genome from 30 hours to just 30 minutes, and leverages AWS Batch to provide an interface that scales compute jobs across multiple instances in the cloud.
Understanding the AWS Batch termination process
In this blog post, we help you understand the AWS Batch job termination process and how you may take actions to gracefully terminate a job by capturing SIGTERM signal inside the application. It provides you with an efficient way to exit your Batch jobs. You also get to know about how job timeouts occur, and how the retry operation works with both traditional AWS Batch jobs and array jobs.