AWS for Industries

Revolutionize Structural Biology Research With AWS

Tiny microscopic differences that we could never dream of seeing with our eyes can make the difference between a cutting edge therapeutic and a lethal poison. In the not too distant past it would have been impossible to imagine the treatments we have access to today and the broad range of conditions we can target with these. Key to many modern medical breakthroughs are structural biologists who spend their time researching the structure of biological compounds and their targets. Without an understanding of what these structures are and how they behave in our body many of the new treatments we rely on to treat disease and save lives wouldn’t exist.

Data from structural biology experiments is critical across these lines of research, designed to explore how proteins and other molecules interact, and how those interactions can be optimized to promote health. “We don’t know all the ways diseases develop, and therefore we don’t know all the ways to design treatments,” says Dr. Jamaine Davis, a structural biologist at Meharry Medical College in Nashville, Tennessee.

Research in structural biology, however, is often constrained by inadequate funding and computing power. To overcome these barriers, Davis teamed up with Piotr Sliz at Harvard Medical School. Dr. Sliz is founder and head of SBGrid, a structural biology consortium that offers a scientific software stack with over 600 commonly used tools to 171 member institutes and 548 laboratories spanning academia, public sector, and industry. Today, thanks to the work of Drs. Sliz and Davis, researchers can learn how to access high performance computing infrastructure to run their structural biology research through SBGrid on HAQM Web Services (AWS).

We’ll provide details about SBGrid on AWS, called “SBCloud” (built by Dr. Sliz’s team), that leverages AWS ParallelCluster. We’ll also describe how Dr. Sliz’s team has been working with Meharry Medical College to conduct workshops. These workshops are designed to orient structural biologists, and newcomers to the field, to this new capability in order to democratize scalable, multimodal data analyses to power innovation.

SBCloud: Providing worldwide access to essential software

Many cutting-edge scientific programs are initially developed by scientists or as a small part of a larger research project, and are unsuitable for large-scale use in industry or research centers. SBGrid addresses this gap by offering curated, production-ready versions of these programs that can be run at scale.

SBCloud is crucial for research in crystallography, nuclear magnetic resonance, electron microscopy, computational chemistry, and structure visualization and analysis. By providing global access to this service, SBGrid is now well-known throughout the structural biology research community and among researchers at major pharmaceutical companies.

Traditionally, SBGrid’s offering was installed on premise at research facilities, limiting usage to an institution’s on-premises computing infrastructure. With the SBGrid software stack on AWS, researchers now also have access to the full suite of tools on scalable, flexible cloud compute. This collaboration allows SBCloud users to leverage AWS hardware and services, potentially accelerating workloads and discovery. Significantly, the new offering, powered by AWS, eliminates the need for owning a high-performance computing (HPC) cluster. Researchers can now access these capabilities with just an AWS account.

High-level architecture overview

Cryo-electron microscopy (Cryo-EM) allows structural biologists to determine the 3D structure of a molecule and is a key use case enabled by SBGrid. Cryo-EM datasets range in size from tens of GBs to tens of TBs, and the processing pipeline generally requires both CPUs and GPUs. Because of these unique storage and compute requirements, researchers are often constrained by existing on-premises hardware when processing Cryo-EM data.

With SBCloud researchers gain access to a scalable Slurm cluster with both an HPC parallel file system for datasets and an NFS file system for managing applications and home directories. Critical for massive Cryo-EM datasets, the HPC parallel file system is backed by highly available object storage for long-term, cost-effective data management.

Figure 1 shows a simplified view of the AWS architecture. AWS ParallelCluster is a solution for deploying and managing HPC clusters in the cloud. With AWS ParallelCluster, users can create clusters with login nodes, a head node, and multiple Slurm partitions, all from a single YAML template file. The template also allows users to create and dynamically mount file systems, including both HAQM FSx for Lustre parallel file systems and HAQM Elastic File System (HAQM EFS) NFS file systems. The diagram also shows a Data Repository Association (DRA) between the Lustre file system and an HAQM Simple Storage Service (HAQM S3) bucket, providing highly available and cost effective object storage for long-term data management.

High-level architecture used to build the SBCloud on AWS. Architecture shows an HAQM EC2 head node configured to use a CPU queue and a GPU queue in a private subnet. For storage HAQM FSX for Lustre, HAQM EFS, and an S3 bucket are usedFigure 1: High-Level Architecture of SBCloud on AWS

Educating the scientific community

Dr. Sliz’s team in the Department of Biological Chemistry and Molecular Pharmacology at Harvard Medical School and Meharry Medical College are conducting workshops that orient cohorts of structural biologists spanning from academics to industry to the new SBCloud on AWS. Workshops are being conducted thus far in Boston, New Zealand, and Africa, with plans for additional workshops throughout the world to follow. Two types of workshops are offered—one for newcomers to the field and a second, more intense training, for faculty already engaged in structural biology research.

“We created this program because most of the current structural biologists have been trained in X-ray crystallography or nuclear magnetic resonance spectroscopy (NMR),” Dr. Davis says. Cryo-EM has revolutionized structural biology because it can capture molecular structures that are difficult to acquire by X-ray techniques, especially proteins that are part of or interact with membranes of cells and their organelles.

The program functions on a train-the-trainers philosophy. Back on their own campuses, workshop attendees can disseminate this new knowledge to students and others, building a supportive environment to conduct more structural biology.

In addition to these workshops, ongoing mentorship for cryo-EM data processing and modeling are provided to further help expand and sustain the scientific community in this innovative technology.

Future improvements and next steps

Future iterations of SBCloud on AWS will explore expanding support for custom silicon, including AWS Graviton, AWS Inferentia, and AWS Trainium. These services aim to provide customers with imp roved performance and cost-effective options for running workloads. Many common bioinformatics tasks run more efficiently and cost-effectively on AWS Graviton processors, which have previously been shown to accelerate genome assembly.

AWS will continue to collaborate with SBGrid to integrate new service offerings into SBCloud, passing performance and price gains to customers. “As we gather data from institutions and organizations adopting the platform worldwide, we will refine the design to better serve the community,” says Dr. Davis.

The continuous improvement of AWS ParallelCluster, the compute engine, and other AWS services will benefit SBCloud customers through increased workload efficiency and cost-effectiveness.

Additional follow-up workshops are planned, where the faculty will dive more deeply into data analysis—ideally learning from a test run of their own protein samples. These extra train-the-trainer workshops will be offered worldwide.

In the future, additional SBGrid tools besides Cryo-EM will also be made available on AWS, further democratizing access to structural biology research—accelerating discoveries that can help treat diseases.

Conclusion

AWS Cloud computing is enabling the democratization of cryo-electron microscopy, structural biology, and biomedical breakthroughs to achieve advancements in precision, inclusive healthcare. The SBGrid service center is helping to deploy this new technology at research hospitals and biopharmas around the world, including at underfunded institutions. Only with many efforts being worked on in parallel can we shorten the timeframe for the drug discovery needed to cure the many types of cancer and other diseases that afflict us.

Contact an AWS Representative to know how we can help accelerate your business.

We would like to acknowledge Carol Cruzan Morton, Michelle Ottaviano, and Jamaine Davis, PhD for their contributions to this blog.

To learn more about AWS for Healthcare & Life Sciences (curated AWS services and AWS Partner Network solutions used by thousands of healthcare and life sciences customers globally) visit the AWS for Healthcare & Life Sciences and AWS Healthcare Solutions webpages. You can also read more blogs about AWS healthcare stories.

Jacob Mevorach

Jacob Mevorach

Jacob Mevorach is a senior specialist for containers for healthcare and the life sciences at AWS. Jacob has a background in bioinformatics and machine learning. Prior to joining AWS, Jacob focused on enabling and conducting large scale analysis for genomics and other scientific areas.

Ben Eisenbraun

Ben Eisenbraun

Ben rejoined SBGrid in 2023 to architect a structural biology-focused, workflow-optimized cloud infrastructure. He comes with a truly unique background, most recently working as a DevOps Engineer at HAQM, but also with deep institutional knowledge from 7 years building up SBGrid, curating software and developing relationships with scientists and software developers. Years ago he discovered the meaning of life, and he uses it as his SSH key.

Christine Tsien Silvers, MD, PhD

Christine Tsien Silvers, MD, PhD

Christine Tsien Silvers, MD, PhD, serves as Healthcare Executive Advisor at AWS. Her research at MIT and Harvard Medical School since the 1990s focused on AI/ML and their use to improve patient care. Trained at Massachusetts General Hospital and Brigham and Women’s Hospital, she is Board certified in Emergency Medicine as well as Clinical Informatics. In the 20+ years prior to joining AWS, Chris worked clinically and then served as Chief Medical Officer at two healthcare technology startups. She is passionate about leveraging technology to improve health.

Jason Key, PhD

Jason Key, PhD

Jason Key, PhD is Associate Director of Technology & Innovation and oversees all technical activities of SBGrid and BioGrids, including software curation and infrastructure development. Prior to joining SBGrid, Jason trained as a postdoctoral fellow with Kevin Garner, at the University of Texas Southwestern Medical School, and with Profs. Rienk van Grondelle and Klaas Hellingwerf at the University of Amsterdam. He completed his graduate studies in macromolecular crystallography in Prof. Keith Moffat's laboratory at the University of Chicago

Marissa E. Powers, PhD

Marissa E. Powers, PhD

Marissa Powers is a specialist solutions architect at AWS focused on high performance computing and life sciences. She has a PhD in computational neuroscience and enjoys working with researchers and scientists to accelerate their drug discovery workloads. She lives in Boston with her family and is a big fan of winter sports and being outdoors.

Piotr Sliz, PhD

Piotr Sliz, PhD

Piotr founded SBGrid in 2000 and continues to coordinate all SBGrid and BioGrids activities. He is an associate professor in Pediatrics at Boston Children's Hospital, and in the department of Biological Chemistry and Molecular Pharmacology at Harvard Medical School. He is also the Vice President of Research Informatics at Boston Children's Hospital and an Advisor for Research Data Technology Resources and Training at Harvard Medical School. Piotr obtained his Ph.D. degree in X-ray crystallography from University of Toronto where he trained with Emil Pai. After his Ph.D. he worked as an HHMI bioinformatician with Profs. Don Wiley and Stephen Harrison at Harvard University.