AWS Partner Network (APN) Blog

Streamlined Multiomics Data Analysis Leveraging Illumina Software on AWS

By: Eric Allen, Genomics Sr Solutions Architect – AWS
By: Olivia Choudhury, PhD, Sr Partner Solutions Architect – AWS
By: Christiana Dobrzynski, Staff Technical Product Manager – Illumina
By: Alex Rutkovsky, PhD, Sr Bioinformatics Applications Scientist – Illumina
By: Anoop Grewal, PhD, Sr Software Product Manager – Illumina

Illumina logo
illumina
illumina cta button

Accelerating Multiomics Research Using Illumina Connected Software on the AWS Cloud

Advances in next-generation sequencing (NGS) technologies have dramatically increased the rate at which omics data is generated. Understanding human health and diseases requires deriving meaningful insights from multiomics data, including genomic, epigenomic, transcriptomic, proteomic, and metabolomic information. However, storing, managing, and analyzing such large volumes of data creates significant challenges for research organizations.

Illumina offers comprehensive software solutions to address these challenges through intuitive analysis workflows hosted on AWS. Illumina’s suite of multiomics analysis tools include capabilities for quality control, data aggregation, and advanced analytics. In this post, we show how software from Illumina, including DRAGEN™, Illumina Connected Analytics, Correlation Engine, and Illumina Connected Multiomics enable researchers to build scalable, secure analytical pipelines while maintaining security, data privacy, and compliance at scale.

Solution Overview

Multiomics combines multiple data sources from different biological levels to provide deeper understanding of complex biological questions. By integrating diverse forms of biological data, multiomics unlocks the potential to obtain deeper insight, such as the discovery of novel biomarkers, stratification of subject populations, and precision medicine. While there are many ways to combine multiple “-omes” into one experiment, they all require powerful analysis tools to yield meaningful results.

Illumina provides several powerful solutions running on AWS that form a comprehensive multiomics analysis platform:

DRAGEN™ (Dynamic Read Analysis for GENomics) secondary analysis with industry-leading accuracy through flexible deployment options. Available through Illumina Connected Analytics, local sequencing instruments, or the AWS Marketplace, DRAGEN combines high accuracy with speed across diverse pipeline types, including Whole Genome Sequencing, Targeted DNA Sequencing, RNA-Seq, ChIP-Seq, and single-cell analysis. Previous AWS blog posts have highlighted DRAGEN’s performance and award-winning accuracy.

Illumina Connected Analytics (ICA) provides a cloud-native platform for managing large-scale multiomics data. ICA accelerates workflow development, infrastructure scaling, and secure data management. The platform offers multiple interaction methods, including web interface, RESTful APIs, and command-line tools. ICA’s modular components – Flow, Bench, and Base – enable pipeline execution, collaborative analysis, and automated data integration.

Correlation Engine powers one of the world’s largest biological databases using HAQM Elastic Map Reduce (EMR), HAQM Aurora, HAQM ElastiCache, and HAQM MQ. Created by curating over 26,000 public omics studies, it offers comprehensive analytical tools, including Body Atlas, Disease Atlas, and Meta-Analysis. The platform enhances researchers’ ability to find meaningful associations across tissues, diseases, compounds, and genetic perturbations through an intuitive interface.

Illumina Connected Multiomics delivers interactive visualizations, powerful statistical analysis, and streamlined workflows through an easy-to-use user interface. Researchers can explore multiomics data using PCA plots, heat maps, and other rich, publication-ready visualizations. Connected Multiomics is optimized for Illumina multiomic library prep kits including single-cell RNA, spatial transcriptomics, epigenomics, and more. Connected Multiomics leverages Illumina’s recent acquisition of the multiomic analysis software, Partek Flow, and incorporates its capabilities into the Illumina Connected Software ecosystem.

The integration of Illumina’s solutions with AWS services creates a comprehensive and secure platform for multiomics analysis. Raw sequencing data and analysis results are stored in HAQM Simple Storage Service (S3), providing durable storage with fine-grained access control. Compute-intensive DRAGEN pipelines run on EC2 F1 instances, while containerized workflows execute through HAQM Elastic Container Service (ECS) with images stored in HAQM Elastic Container Registry (ECR). The platform also uses HAQM OpenSearch Service to enable rapid data queries across large datasets. Correlation Engine’s extensive biological database leverages HAQM EMR for large-scale data processing, while HAQM Aurora and ElastiCache optimize database performance. Service communication is handled through HAQM MQ, ensuring reliable message delivery across components.

Real-World Application: Multiomics Analysis of Psoriasis Studies

To demonstrate these solutions’ capabilities, we provide a real-world example of how researchers can analyze psoriasis through a multiomics approach. Psoriasis is a disease characterized by inflamed red patches on skin resulting in itchiness and pain. This study combines RNA expression analysis with histone modification data to understand disease mechanisms. Histones are proteins that wind DNA into compact structures when DNA is not actively being transcribed into RNA.

Using Gene Expression Omnibus (GEO) dataset GSE205748, we first processed RNA-seq data through the DRAGEN RNA-Seq pipeline on ICA. The pipeline’s gene expression quantification module estimated transcript and gene expression levels. To compare expression patterns between active psoriasis tissue and two controls: normal skin tissue from healthy patients and uninflamed skin from psoriasis patients, we ran DESeq2 analysis.

For histone modification analysis, we processed ChIP-seq data using GEO dataset GSE161076 using DRAGEN and MACS3 to identify regions with differentially modified Histone H3 binding between psoriasis-affected and unaffected controls. This analysis helps understand how histone modifications might regulate gene expression in disease states.

Importing these results into the Correlation Engine revealed several significant insights. Meta-analysis identified the gene kynureninase (KYNU) as concordantly regulated across both RNA expression and histone modification datasets (Fig 1), with statistical analysis showing significant overlap between the two data types.

Meta-analysis results on Correlation Engine showing KYNU as top-ranking gene

Figure 1 – Meta-analysis results on Correlation Engine showing KYNU as top-ranking gene

Further investigation through Correlation Engine’s knowledge base showed KYNU’s presence in 58 of 112 psoriasis studies (Fig 2). Additionally, the Literature Search feature revealed KYNU’s association with psoriasis in several publications.

Correlation Engine interface showing psoriasis-related query results

Figure 2 – Correlation Engine interface showing psoriasis-related query result

Next, we used Connected Multiomics statistical analysis capabilities for additional validation. We orchestrated a data analysis pipeline using the built-in visual pipeline editor (Fig 3).

The task graph view in Illumina Connected Multiomics powered by Partek Flow allows users to document the series of analyses applied to the study results.

Figure 3 – The task graph view in Illumina Connected Multiomics allows users to document the series of analyzes applied to the study results.

Conclusion

The prospect of conducting multiomics research can be both exciting and daunting. With multiple ways to analyze tissues, biological samples, and an ever-increasing volume of multiomic data, scalable and accurate bioinformatics solutions are needed to keep pace and deepen our understanding of disease phenotypes and intervention plans. Further interrogation of new omics types promises to unlock new levels of understanding behind complex biological events.

Illumina’s suite of secondary and tertiary analysis solutions hosted on AWS, including DRAGEN pipelines on ICA, Correlation Engine, and Illumina Connected Multiomics, address the challenges of working with multiomics data sets at scale. The platform’s integration with AWS services ensures reliable performance and scalability, enabling researchers to focus on advancing scientific understanding, rather than managing infrastructure. Leveraging Illumina informatics on the AWS cloud, users are able to analyze, explore, and interpret multiomics data while focusing on what matters most: next generation scientific discovery. To learn more about Illumina informatics solutions for multiomics, refer to Illumina Informatics Products.

For Research Use Only. Not for use in diagnostic procedures.

M-GL-03363

illumina connect


Illumina – AWS Partner Spotlight

Illumina is an AWS Partner providing comprehensive genomics workflows through cloud-based software solutions. Their tools enable researchers worldwide to analyze complex biological data at scale while maintaining security and compliance requirements.

Contact Illumina | Partner Overview | AWS Marketplace