AWS for Industries

Reduce Genomic Discovery Time and Costs with AWS HealthOmics Run Analyzer

Bioinformatics researchers running production genomic workflows face a critical challenge: ensuring computational resources are properly allocated to maximize cost efficiency without sacrificing performance. Today, we’re excited to share significant enhancements to the AWS HealthOmics Run Analyzer tool that directly address this challenge.

These capabilities deliver substantial benefits to our customers, including:

  • The ability to analyze multiple workflow runs in aggregate
  • Visualize task timelines interactively
  • Tune resource recommendations with configurable safety margins
  • Automatically generate configuration files

Together, these features help you reduce computational costs, increase workflow reliability, and speed up the optimization process—ultimately accelerating your path from genomic data to scientific discovery.

Run Analyzer Overview

AWS HealthOmics helps bioinformaticians, researchers, and scientists store, query, analyze, and generate insights from genomics and other biological data. The Run Analyzer tool is an open-source command-line interface (CLI) that provides comprehensive workflow analysis capabilities. It can identify over-provisioned resources, identify bottle-necks, and reduce costs through right-sizing computational resources.

At its core, Run Analyzer derives 25 metrics for each task in a workflow, including run times, memory, CPU, GPU, and storage utilization. It then makes data-driven recommendations for optimal resource allocation along with estimated cost and potential savings. Following is a basic example of the command:

python -m omics.cli.run_analyzer <run-id> -o analysis.csv

Run Analyzer delivers the most value when you apply it after completing the initial workflow development and testing. Premature optimization risks making your workflow ineffective at production scale. Run Analyzer excels in three key scenarios:

  1. Preparing for production deployment: It identifies bottlenecks and critical paths.
  2. Scaling up to larger datasets: It determines which workflow steps need additional resources.
  3. Optimizing high-volume or long-running workflows: It pinpoints the most cost-effective resource allocation for each task.

Based on customer feedback, we’ve added six additional features and improvements to the tool that enhance these core capabilities.

New Run Analyzer Features

1 – Enhanced Output and Usability
We’ve improved the clarity and usability of Run Analyzer outputs with several key enhancements. Units have been added to metric labels to clarify exactly what is being measured, while new columns provide recommended memory and CPU allocations that can be used directly in workflow definitions. We’ve also improved cost estimation to better reflect HealthOmics metering practices and costs.

These improvements help you understand analysis results more readily and implement optimizations more quickly without needing to perform additional calculations or lookups. Output is comma separated text, which can be effortlessly imported into spreadsheets and analytics tools (Figure 1).

Selected Run Analyzer output in a spreadsheet. It is showing three rows of information. Not all columns are shown. Recommended memory and CPU values are based on the peak utilization and minimum USD is the estimated cost that would be paid if the recommended resources were used for the sample runtime.Figure 1 – A spreadsheet displaying Run Analyzer output

2 – Batch Analysis Mode
One of the biggest challenges in workflow optimization is accounting for run-to-run variance. Some samples may contain more genetic variations or be sequenced to higher coverage, creating different resource requirements. The new batch mode allows you to analyze multiple runs simultaneously. The following is an example of a batch mode analysis command line:

python -m omics.cli.run_analyzer <run1_id> <run2_id> ... --batch

This mode computes 17 metrics including mean, maximum, and standard deviation for runtime, CPU and memory utilization, and estimated costs across all analyzed runs. Batch analysis enables you to make recommendations based on comprehensive data rather than single runs, confirming your workflows have enough resources for your largest samples. It also helps you identify and investigate atypical samples that fall outside expected resource utilization patterns.

3 – Intelligent Resource Recommendations with Headroom Controls
Different workflows have varying tolerance levels for resource constraints. The new headroom controls let you add configurable buffers to CPU and memory recommendations, balancing cost optimization against performance reliability. The following is an example of a headroom command line:

python -m omics.cli.run_analyzer --headroom 0.2 <run-id>

This example adds a 20 percent safety margin to calculations used to make resource recommendations.

Headroom controls help you account for unexpected variations between runs, provide a safety buffer for resource-sensitive tasks, and avoid over-optimizing workflows that process variable data. It strikes the right balance between cost savings and reliable performance.

4 – Interactive Timeline Visualization
Understanding workflow timeline patterns is crucial for optimization. The new visualization features help you track task timing and dependencies, and identify bottlenecks and optimization opportunities. It can also visualize concurrent or sequential task execution, and validate task sequencing and workflow logic. Generate these interactive visualizations with:

python -m omics.cli.run_analyzer -P timeline_output/ <run-id>

The tool creates dynamic HTML and JavaScript reports that open automatically in your browser. Timeline visualization is valuable when you need to identify long-running tasks that block workflow progress, and understand task dependencies and execution patterns. Timeline visualization can also help discover opportunities to increase parallelism in your workflows, all of which can lead to significant performance improvements.

In Figure 2, a timeline plot was used to identify a single task, indicated by the orange arrow, that was contributing most of the observed runtime of the workflow. This long running task prevented dependent tasks from starting—leading to a ‘gap’ indicated by the orange brace. By using interactive zoom and the tooltip popup (Figure 3) we can identify the details of the long running task.

Screenshot of a timeline visualization plot produced by Run Analyzer showing a single long running task blocking the workflow. An annotation indicates the blocking task and the ‘gap’ it causes in the workflow timeline.

Figure 2: A timeline visualization plot of an NF-core RNASeq run with a long running task

A zoomed view of Figure 2 with a tooltip showing details of a long running task.

Figure 3: A tooltip revealing the long running task details

5 – Expanded Support for New HealthOmics Workflow Features
We’ve added support for analyzing workflows that use dynamic run storage, which grows and shrinks automatically with your data requirements. Support for runs using call caching, which allows a workflow run to resume from the last successful task, has also been added.

Run Analyzer accounts for these features when calculating resource utilization metrics and making optimization recommendations. It facilitates accurate analysis that includes these advanced workflow capabilities.

6 – Automated Configuration File Generation for Nextflow
Implementing optimization recommendations is now streamlined with automatic generation of Nextflow configuration files. The following is an example of producing a Nextflow coding:

python -m omics.cli.run_analyzer <run-id> \

--write-config=optimized.config

This produces a ready-to-use configuration file with recommended CPU and memory settings. The following is an example of the resulting configuration with recommended CPU and memory settings:

process {

withName: 'alignment' {

cpus = 4

memory = '16 GB'

}

// Additional process configurations...

}

The configuration generation feature enables you to quickly implement optimization recommendations without manual configuration. It helps standardize resource allocations across your team, and create configuration variants for different sample types or computational environments—streamlining the entire optimization process.

Get Started Today

The Run Analyzer tool is open-source and available through GitHub and PyPI, and we welcome community contributions. To get started, download Run Analyzer from our GitHub repository or install by using PyPI. Review the new Run optimization for a private HealthOmics workflow documentation and try analyzing one of your existing workflow runs with:

python -m omics.cli.run_analyzer <run-id>

To show all the analysis options, run:

python -m omics.cli.run_analyzer –help

Conclusion

The enhanced AWS HealthOmics Run Analyzer tool delivers a comprehensive solution for optimizing bioinformatics workflows on AWS HealthOmics. The new features provide batch analysis capabilities, interactive visualizations, configurable resource recommendations, and automated configuration generation.

Researchers can now make more informed decisions about resource allocation, leading to significant cost savings and improved workflow performance. Enhanced features and tools like this empower bioinformatics teams to focus more on scientific discovery and less on computational infrastructure management—accelerating the pace of genomic research and discovery of clinical applications.

We encourage you to engage with the project on GitHub or reach out to your AWS account team with questions. Don’t miss the opportunity to improve your workflow efficiency, reduce computational costs, and accelerate your scientific discoveries.

Contact an AWS Representative to know how we can help accelerate your business.

Further Reading

Mark Schreiber

Mark Schreiber

Mark is a Senior Genomics Consultant working in the AWS Health artificial intelligence (AI) team. Mark specializes in genomics and life sciences applications and data. He holds a PhD from the University of Otago in New Zealand. Prior to joining AWS, he worked for several years with pharmaceutical and biotech companies. Mark is also a frequent contributor to open-source projects.

Kevin Sayers

Kevin Sayers

Kevin Sayers is a Delivery Consultant in the Health & Advanced Compute Professional Services team. He has a strong background in both HPC and bioinformatics and primarily focuses on life sciences HPC projects. He has an MSc in Bioinformatics from the Autonomous University of Barcelona.

Margo McDowall

Margo McDowall

Margo McDowall is the Principal Product Manager for AWS HealthOmics. She has a BS in Molecular, Cellular, and Developmental Biology from the University of Washington.