Customer Stories / Education / UK

2025
Company Logo

University of Oxford’s APAD Uses AWS to Improve Air Quality and Aid Communities with ML

Learn how the University of Oxford’s APAD is advancing air pollution research with ML pipelines powered by HAQM EC2 instances.

1.2 million

satelitte images processed

72x reduction

in ML runtime

About 17,600

compute hours saved

Up to 80%

reduction in infrastructure costs

90% reduction

in monitoring time and task runtime

Overview

Across the Indo-Gangetic Plain (IGP), millions face shortened lifespans from breathing some of the world’s most polluted air. For decades, the lack of precise data on pollution sources has hindered efforts to address this crisis, and communities in the IGP have been fighting an invisible enemy.

Air Pollution Asset-Level Detection (APAD), a research project started with an innovation award from the Smith School of Enterprise and the Environment, University of Oxford, is changing this reality. APAD built custom machine learning (ML) models to analyze satellite imagery and identify pollution sources, using HAQM Web Services (AWS) infrastructure to store and process this massive dataset. By creating a comprehensive map of pollution sources, the organization is giving communities the evidence they need to make targeted interventions.

Hardware electronic circuit board. technology style concept semiconductor motherboard computer server cpu

Opportunity | Using AWS Infrastructure to Support ML Research for APAD

APAD identifies emission assets and analyzes the impact of harmful but often overlooked pollutants. The organization operates primarily in the IGP, which has a population of more than 400 million.

In the IGP, millions of people face cardiovascular and respiratory health issues because of air pollution. Schools in India and Pakistan frequently close when smog is too high, forcing children to remain indoors and miss educational opportunities. A lack of comprehensive data on pollution sources caused these issues to remain under-researched—and often unaddressed.

Brickkilns, traditional brick-making factories that burn coal and other materials, are major contributors to poor air quality. In May 2024, APAD set out to create a visual map of all pollution sources using satellite imagery and ML to help communities make data-driven decisions.

APAD had run ML workloads on local computers, but the machines couldn’t support the scale of the project, which involves millions of data points. APAD needed an affordable solution that could scale with its ambitious goals. It adopted HAQM Elastic Compute Cloud (HAQM EC2), which provides secure and resizable compute capacity for virtually any workload. The organization also used HAQM Simple Storage Service (HAQM S3), an object storage service built to retrieve virtually any amount of data from anywhere, to store over 500 GB of image and result data.

“Our goals were to be efficient and to get the data out so that there was something tangible for governments and organizations in the IGP region to use,” says Suleman Hamdani, ML and deep learning (DL) specialist at APAD. “We started with a social goal, and AWS was the means to reach that goal.”

kr_quotemark

We started with a social goal, and AWS was the means to reach that goal.” 

Suleman Hamdani
Machine Learning and Deep Learning Specialist, Air Pollution Asset-Level Detection

Solution | Processing 1.2 Million Satellite Images with ML and DL Models

Over 5 months, APAD created a pipeline that helps turn satellite data into actionable environmental insights. Here’s how it works: First, APAD downloads massive amounts of low-resolution satellite imagery data to HAQM S3 buckets and then annotates and preprocesses the data. Afterward, it feeds the data into ML and DL models that run on HAQM EC2 C5 Instances, designed for compute-intensive workloads, and HAQM EC2 G4 Instances, powered by NVIDIA T4 Tensor Core GPUs, respectively. These models have been trained to recognize the visual signatures of pollution sources such as field burning. When the models detect potential pollution sources, they automatically map the coordinates using parallel processing. For greater accuracy, APAD also developed a second pipeline that uses DL to analyze high-resolution imagery.

[Figure 1. Workflow pipeline showing (a) brickkiln detection, (b) postprocessing to geolocate kilns, and (c) YOLO v8–based detection]

[Figure 1. Workflow pipeline showing (a) brickkiln detection, (b) postprocessing to geolocate kilns, and (c) YOLO v8–based detection]

To improve compute efficiency and reduce the load on its servers, APAD used HAQM Q, a generative artificial intelligence–powered assistant for accelerating software development. By using AWS rather than its own infrastructure, APAD reduced ML runtimes, image retrieval time, and download time of high-resolution images by 72, 100, and 5 times, respectively—saving several months of work.

Parallel processing also accelerated model inference time by 50 times. “We had to use multiprocessing for these images because we didn’t have enough compute resources,” says Hamdani. “Processing such huge amounts of data would’ve taken much more time if it weren’t for AWS.” Using AWS, APAD saved about 17,600 compute hours and cut infrastructure costs by up to 80 percent. It also reduced monitoring time and task runtime by 90 percent—from several hours to a few minutes per day.

On AWS, APAD gained the power it needed to process and store 1.2 million satellite images. These images helped the organization map 1.5 million square miles in the IGP and identify over 50,000 pollution sources, including two types of brickkilns. Using AWS, APAD also increased processing throughput by over 400 percent—from around 5,000 to 30,000 square miles per day.

To help other organizations build on its work, APAD made all its data and technical pipelines open source through the Registry of Open Data on AWS in November 2024. “Our goal is to make our data more accessible and help researchers add value to it,” says Khizer Zakir, geographic information system specialist at APAD. Multiple organizations in Pakistan and India are using APAD’s work as part of local initiatives and projects that are aimed at air quality. APAD has also drawn interest from organizations outside the IGP for use cases beyond air quality research.

APAD conducted a workshop at a university in Islamabad to teach students how to use APAD’s code and data for their master’s projects. “An important part of our impact strategy is going beyond simply producing data and research,” says Hassan Sheikh, founder of APAD. “We want to empower local communities and researchers.”

Outcome | Advancing Air Quality Research Around the World

Using AWS, APAD created a blueprint for how small organizations with limited resources can tackle environmental challenges. Powerful ML solutions don’t require extensive hardware investments or years of development when they are built on reliable infrastructure. The organization has collected and processed about 1 million medium- and high-resolution images of pollution types and sources. And it is now focused on helping researchers and policymakers use this data to make informed decisions. For example, the data can be used to estimate the pollution that vehicles emit in certain regions, which can contribute to vehicle and traffic regulations.

Moving forward, APAD plans to expand its work into Africa, starting with Uganda, Nigeria, Ghana, Malawi, and the Democratic Republic of the Congo. It will analyze the primary contributors to the target pollutants in these regions, with a particular focus on coal and cement plants, brickkilns, the pulp and paper industry, and formal and informal waste sites.

“We’re excited to connect with more and more people who are enthusiastic about the environment,” says Jaisha Mubashir, communications specialist at APAD. “We hope to maximize the impact of this open-source dataset.”

About Air Pollution Asset-Level Detection

Spun off from the University of Oxford, Air Pollution Asset-Level Detection is a research project that is focused on identifying air pollution emission assets. It collects and processes important data and makes it available as an open-source dataset.

AWS Services Used

HAQM S3

HAQM Simple Storage Service (HAQM S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.

Learn more »

HAQM EC2

HAQM Elastic Compute Cloud (HAQM EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload. 

Learn more »

HAQM Q

The most capable generative AI–powered assistant for accelerating software development and leveraging companies' internal data

Learn more »

More Education Customer Stories

no items found 

1

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.