AWS Big Data Blog

In-place version upgrades for applications on HAQM Managed Service for Apache Flink now supported

Managed Service for Apache Flink is a fully managed, serverless experience in running Apache Flink applications, and now supports Apache Flink 1.18.1, the latest released version of Apache Flink at the time of writing. In this post, we explore in-place version upgrades, a new feature offered by Managed Service for Apache Flink. We provide guidance on getting started and offer detailed insights into the feature. Later, we deep dive into how the feature works and some sample use cases.

Entity resolution and fuzzy matches in AWS Glue using the Zingg open source library

In this post, we explore how to use Zingg’s entity resolution capabilities within an AWS Glue notebook, which you can later run as an extract, transform, and load (ETL) job. By integrating Zingg in your notebooks or ETL jobs, you can effectively address data governance challenges and provide consistent and accurate data across your organization.

Introducing blueprint discovery and other UI enhancements for HAQM OpenSearch Ingestion

HAQM OpenSearch Ingestion is a fully managed serverless pipeline that allows you to ingest, filter, transform, enrich, and route data to an HAQM OpenSearch Service domain or HAQM OpenSearch Serverless collection. OpenSearch Ingestion is capable of ingesting data from a wide variety of sources and has a rich ecosystem of built-in processors to take care […]

Use AWS Data Exchange to seamlessly share Apache Hudi datasets

Apache Hudi was originally developed by Uber in 2016 to bring to life a transactional data lake that could quickly and reliably absorb updates to support the massive growth of the company’s ride-sharing platform. Apache Hudi is now widely used to build very large-scale data lakes by many across the industry. Today, Hudi is the […]

Overview of the solution

AVB accelerates search in LINQ with HAQM OpenSearch Service

AVB Marketing delivers custom digital solutions for their members across a wide range of products. LINQ, AVB’s proprietary product information management system, empowers their appliance, consumer electronics, and furniture retailer members to streamline the management of their product catalog. In this post, we share how AVB reduced their average search time from 3 seconds to 300 milliseconds in LINQ by adopting HAQM OpenSearch Service while processing 14.5 million record updates daily.

Understanding Apache Iceberg on AWS with the new technical guide

We’re excited to announce the launch of the Apache Iceberg on AWS technical guide. Whether you are new to Apache Iceberg on AWS or already running production workloads on AWS, this comprehensive technical guide offers detailed guidance on foundational concepts to advanced optimizations to build your transactional data lake with Apache Iceberg on AWS.

HAQM DocumentDB zero-ETL integration with HAQM OpenSearch Service is now available

Today, we are announcing the general availability of HAQM DocumentDB (with MongoDB compatibility) zero-ETL integration with HAQM OpenSearch Service. HAQM DocumentDB provides native text search and vector search capabilities. With HAQM OpenSearch Service, you can perform advanced search analytics, such as fuzzy search, synonym search, cross-collection search, and multilingual search, on HAQM DocumentDB data. Zero-ETL […]

Safely remove Kafka brokers from HAQM MSK provisioned clusters

Today, we are announcing broker removal capability for HAQM Managed Streaming for Apache Kafka (HAQM MSK) provisioned clusters, which lets you remove multiple brokers from your provisioned clusters. You can now reduce your cluster’s storage and compute capacity by removing sets of brokers, with no availability impact, data durability risk, or disruption to your data streaming […]

Introducing HAQM MWAA support for the Airflow REST API and web server auto scaling

Apache Airflow is a popular platform for enterprises looking to orchestrate complex data pipelines and workflows. HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service that streamlines the setup and operation of secure and highly available Airflow environments in the cloud. In this post, we’re excited to introduce two new features that […]