AWS Big Data Blog

Category: Intermediate (200)

OpenSearch optimized instance (OR1) is game changing for indexing performance and cost

HAQM OpenSearch Service securely unlocks real-time search, monitoring, and analysis of business and operational data for use cases like application monitoring, log analytics, observability, and website search. In this post, we examine the OR1 instance type, an OpenSearch optimized instance introduced on November 29, 2023. OR1 is an instance type for HAQM OpenSearch Service that […]

AWS Glue mutual TLS authentication for HAQM MSK

In today’s landscape, data streams continuously from countless sources such as social media interactions to Internet of Things (IoT) device readings. This torrent of real-time information presents both a challenge and an opportunity for businesses. To harness the power of this data effectively, organizations need robust systems for ingesting, processing, and analyzing streaming data at […]

How HAQM GTTS runs large-scale ETL jobs on AWS using HAQM MWAA

The HAQM Global Transportation Technology Services (GTTS) team owns a set of products called INSITE (Insights Into Transportation Everywhere). These products are user-facing applications that solve specific business problems across different transportation domains: network topology management, capacity management, and network monitoring. As of this writing, GTTS serves around 10,000 customers globally on a monthly basis, […]

Get started with the new HAQM DataZone enhancements for HAQM Redshift

In today’s data-driven landscape, organizations are seeking ways to streamline their data management processes and unlock the full potential of their data assets, while controlling access and enforcing governance. That’s why we introduced HAQM DataZone. HAQM DataZone is a powerful data management service that empowers data engineers, data scientists, product managers, analysts, and business users […]

Manage HAQM Redshift provisioned clusters with Terraform

HAQM Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it straightforward and cost-effective to analyze all your data using standard SQL and your existing extract, transform, and load (ETL); business intelligence (BI); and reporting tools. Tens of thousands of customers use HAQM Redshift to process exabytes of data per […]

Configure SAML federation with HAQM OpenSearch Serverless and Keycloak

HAQM OpenSearch Serverless is a serverless version of HAQM OpenSearch Service, a fully managed open search and analytics platform. On HAQM OpenSearch Service you can run petabyte-scale search and analytics workloads without the heavy lifting of managing the underlying OpenSearch Service clusters and HAQM OpenSearch Serverless supports workloads up to 30TB of data for time-series […]

Streamline your data governance by deploying HAQM DataZone with the AWS CDK

Managing data across diverse environments can be a complex and daunting task. HAQM DataZone simplifies this so you can catalog, discover, share, and govern data stored across AWS, on premises, and third-party sources. Many organizations manage vast amounts of data assets owned by various teams, creating a complex landscape that poses challenges for scalable data […]

Protein similarity search using ProtT5-XL-UniRef50 and HAQM OpenSearch Service

A protein is a sequence of amino acids that, when chained together, creates a 3D structure. This 3D structure allows the protein to bind to other structures within the body and initiate changes. This binding is core to the working of many drugs. A common workflow within drug discovery is searching for similar proteins, because […]

Introducing HAQM MWAA support for Apache Airflow version 2.9.2

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that significantly improves security and availability, and reduces infrastructure management overhead when setting up and operating end-to-end data pipelines in the cloud. Today, we are announcing the availability of Apache Airflow version 2.9.2 environments on HAQM MWAA. Apache Airflow […]

Run Apache XTable on HAQM MWAA to translate open table formats

In this post, we show you how to get started with Apache XTable on AWS and how you can use it in a batch pipeline orchestrated with HAQM Managed Workflows for Apache Airflow (HAQM MWAA). To understand how XTable and similar solutions work, we start with a high-level background on metadata management in an OTF and then dive deeper into XTable and its usage.