Building hybrid satellite imagery processing pipelines in AWS

From their unique vantage point, satellites acquire data that help us better understand our universe. Earth observation (EO) satellites orbiting our planet are constantly capturing imagery to monitor and understand Earth’s environment. This can support decision-making on a wide range of topics including climate change, disaster management, agriculture and food security, and infrastructure development.

But to extract this actionable data and insights, raw satellite data needs to undergo several processing steps to be transformed into higher level products. These processing pipelines often use machine learning (ML) algorithms for some of those steps.

Aerospace and geospatial companies use the HAQM Web Services (AWS) Cloud to develop and deploy these processing workloads in a secure, scalable, and cost-optimized way. They can use HAQM Simple Storage Service (HAQM S3) to store petabytes of data durably and cost-efficiently, and choose their preferred compute solution to define an architecture capable of scaling dynamically based on demand. Some customers rely on AWS serverless services portfolio, delegating infrastructure management and reducing operational overhead. Orbital Sidekick uses AWS to process satellite data with this type of architecture.

However, in some cases, companies encounter particular use cases or end-customers which require their processing pipeline to be deployed on-premises. The rationale behind these requirements is diverse; some organizations have data residency needs, or intend to maximize the return on investment for existing infrastructure. Even if these scenarios represent a small fraction of the total use cases, companies developing these types of workloads usually aim to design architectures that can support these needs and avoid maintaining two parallel solutions: one suitable for cloud deployments and one for on-premises use cases.

In this blog post, learn how companies operating in AWS can design highly-flexible architectures that can support both cloud and on-premises deployment use cases for their satellite imagery processing workloads with minimal modifications, using AWS services like HAQM Elastic Kubernetes Service (HAQM EKS) and AWS Outposts.

What does on-premises really mean?

When discussing deployments that must be performed on-premises, the specific requirements behind this need typically determine the most beneficial solution. Some may think that running on-premises necessarily implies running on customer infrastructure. However, this may not be the case. For instance, if this requirement is due to latency or data residency constraints, then organizations and companies can consider running on AWS Outposts as an alternative solution.

Outposts is a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location. With Outposts, you can run some AWS services locally and connect to a broad range of services available in the AWS Region, achieving operational consistency and using familiar AWS services, tools, and APIs, which help maintain the same pace of innovation as in the cloud. It significantly reduces operational overhead compared to running on customer infrastructure as under the AWS Shared Responsibility Model, AWS is responsible for the hardware and software that run AWS services. AWS manages security patches, updates firmware, and maintains the Outposts equipment. AWS also monitors the performance, health, and metrics for your Outposts and determines whether any maintenance is required.

Consistent compute environments across disparate deployments

When organizations need to prioritize the ability to deploy data processing workflows across multiple environments (including cloud and on-premises), they can use HAQM EKS. As a container-based solution, HAQM EKS offers portability and operational consistency across different deployment environments. Plus, HAQM EKS is certified Kubernetes-conformant, so existing applications that run on upstream Kubernetes are compatible with HAQM EKS. This offers further advantages, such as increased interoperability, flexibility, and a growing pool of Kubernetes literate IT professionals.

Companies can use Kubernetes to run data processing jobs and orchestrate ML pipelines in cloud-native or on-premises environments as it supports job schedulers such as Airflow, Prefect, or Argo to manage complex workflows; frameworks like Spark for batch processing; and multiple ML platforms like Kubeflow or MLFlow. Last year, AWS introduced the AWS Data on EKS (DoEKS) initiative to create and distribute resources, such as best practices, infrastructure as code (IaC) templates, and sample code, to simplify and speed up the process of building, deploying, and scaling data workloads on HAQM EKS.

HAQM EKS provides a spectrum of Kubernetes deployment options ranging from AWS-managed to customer-managed infrastructure, so that you can adapt to many use cases.

Figure 1. HAQM EKS spectrum of deployment options from AWS-managed to customer-managed. Beginning at the AWS-managed end and moving toward more customer-managed options, customers can use HAQM EKS, HAQM EKS in Local Zones, HAQM EKS in Wavelength Zones, HAQM EKS on Outposts, HAQM EKS Anywhere, and HAQM EKS Distro.

Deploying in an AWS Region

When running workloads in an AWS Region, HAQM EKS helps run Kubernetes clusters at scale. HAQM EKS minimizes the operational effort required by providing a fully-managed, highly-available, and scalable Kubernetes control plane running across multiple AWS Availability Zones. You can then choose to use either HAQM Elastic Compute Cloud (HAQM EC2) instances or AWS Fargate for the data plane.

Deploying on AWS Outposts rack

When deploying a pipeline on-premises using an Outposts rack, you can use HAQM EKS on Outposts and keep using the same application programming interfaces (APIs), console, and tools you use to run HAQM EKS clusters in the cloud. With the extended clusters deployment option, you can continue running the Kubernetes control plane in an AWS Region and the worker nodes in the Outposts rack. However, if there is poor or intermittent connectivity to the AWS Region running HAQM EKS, AWS recommends using the local clusters deployment option, in which both the control plane and nodes run in the Outposts rack.

Deploying on customer infrastructure

For on-premises use cases where Outposts rack is not a viable option, you can still use HAQM EKS Anywhere to create and operate Kubernetes clusters on-premises on customer infrastructure. HAQM EKS Anywhere uses HAQM EKS Distro, the same Kubernetes distribution deployed by HAQM EKS, allowing you to create clusters consistent with HAQM EKS best practices like the latest software updates and extended security patches.

How to implement machine learning operations (MLOps) in hybrid environments

Typically, satellite imagery processing pipelines include steps that perform ML inference, such as cloud detection and land cover classification. In these cases, it is important to build robust MLOps and maintain traceability for the models deployed across multiple environments.

In hybrid scenarios that require ML inference to be performed on-premises, customers can choose two main options:

Deploy the complete MLOps pipeline as part of the on-premises workload, including building, training, deploying, and managing the ML models. Customers can deploy their preferred ML platform, such as Kubeflow, Metaflow or MLflow, on the provisioned HAQM EKS cluster, either on Outposts or on customer infrastructure. These frameworks are open-source and offer flexibility and portability.

Build, train, and manage ML models in the AWS Region and deploy the models to run inference on-premises. In this case, you can still run your preferred open-source ML platform on an HAQM EKS cluster in the AWS Region; however, as an alternative, you can use HAQM SageMaker. SageMaker is an AWS service to prepare data and build, train, and deploy ML models with fully managed infrastructure, tools, and workflows. Models built and trained with SageMaker in the AWS Region can then be deployed on-premises.

How does everything fit together?

You can integrate the previously discussed on-premises deployments with the rest of your infrastructure in AWS. The following Figure 2 and Figure 3 show two reference architectures including an on-premises deployment on Outposts rack and an on-premises deployment on customer infrastructure, respectively. Details of the implementation will vary depending on the particular use case and associated requirements.

Satellite imagery processing pipeline deployment on Outposts rack

Figure 2. Architectural diagram for a satellite imagery processing pipeline deployed on Outposts rack.

Figure 2 features a high-level architecture for creating a satellite imagery processing pipeline deployed on Outposts rack. Use the following steps to build the architecture:

Create a continuous integration (CI) pipeline for your imagery processing workloads using AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild. Store the container images in HAQM Elastic Container Registry (HAQM ECR).
Develop and train your ML models either using SageMaker in the AWS Region, or an alternative ML solution, either in the AWS Region or as part of the on-premises deployment.
Use HAQM CloudWatch to centrally monitor AWS and on-premises resources.
Achieve a consistent hybrid experience and fully managed infrastructure using Outposts rack for the on-premises deployment.
Host your processing pipeline in HAQM EKS on Outposts. Choose your preferred orchestration tool.
Use a continuous delivery (CD) tool like FluxCD, an open source CD system developed by Weaveworks, to retrieve and deploy the latest container images.
Run batch operations to optimize processing time using solutions such as HAQM EMR on EKS.
Use the ML framework chosen during model development for the processing pipeline steps that require ML inference.
Store your raw and processed satellite imagery data in HAQM S3 on Outposts. Maintain metadata in HAQM Relational Database Service (HAQM RDS).
A service link connects the Outpost rack with your chosen AWS Region. Optionally, you can use AWS Direct Connect.

Satellite imagery processing pipeline deployment on customer infrastructure

Figure 3. Architectural diagram for a satellite imagery processing pipeline deployed on customer infrastructure.

Figure 3 features a high-level architecture for creating a satellite imagery processing pipeline deployed on the customer infrastructure. Use the following steps to build the architecture:

Create a continuous integration (CI) pipeline for your imagery processing workloads using CodeCommit, CodePipeline, and CodeBuild. Store the container images in HAQM ECR.
Develop and train your ML models either using SageMaker in the AWS Region, or an alternative ML solution either in the AWS Region or as part of the on-premises deployment.
Use CloudWatch to centrally monitor AWS and on-premises resources.
For cases where requirements do not allow for an Outposts rack deployment, customers can deploy this hybrid architecture directly on customer infrastructure.
Host the processing pipeline in HAQM EKS Anywhere. Choose your preferred orchestration tool.
Use a continuous delivery (CD) tool like FluxCD to retrieve and deploy the latest container images.
Run batch operations to optimize processing time using your preferred solution.
Use the ML framework chosen during model development for the processing pipeline steps that require ML inference.
Store your raw and processed satellite imagery data in your chosen object storage solution. Maintain metadata in a PostgreSQL database.
Connect your AWS Region deployment with your corporate data center using AWS Site-to-Site VPN or Direct Connect.

Learn more about AWS for aerospace and satellite

Aerospace organizations can use AWS to design architectures that maximize flexibility for their satellite imagery processing workloads and allow for both cloud and on-premises deployment use cases with minimal modifications. Find more curated solutions for other common use cases for the aerospace and satellite industry in the AWS Solutions Library.

Organizations of all sizes across all industries are transforming and delivering on their aerospace and satellite missions every day using AWS. Learn more about the cloud for aerospace and satellite solutions so you can start your own AWS Cloud journey today.

Get inspired. Watch our AWS in Space story.

AWS Public Sector Blog