AWS Big Data Blog
Zero-copy, Coordination-free approach to OpenSearch Snapshots
In this blog post, we tell you how we enhanced the snapshot efficiency in HAQM OpenSearch Service while carefully maintaining these critical operational aspects. These snapshot optimizations are enabled for all OpenSearch optimized instance family (OR1, OR2, OM2) domains from version 2.17 onwards.
Enhance governance with asset type usage policies in HAQM SageMaker
In this post, we introduce authorization policies for custom asset types—a new governance capability in HAQM SageMaker that gives organizations fine-grained control over who can create and manage assets using specific templates. This feature enhances data governance by allowing teams to enforce usage policies that align with business and security requirements across the organization.
Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with HAQM EMR Serverless
In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to HAQM EMR Serverless, detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations.
Configure cross-account access of HAQM SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark
In this post, we show you how to share an HAQM Redshift table and HAQM S3 based Iceberg table from the account that owns the data to another account that consumes the data. In the recipient account, we run a join query on the shared data lake and data warehouse tables using Spark in AWS Glue 5.0. We walk you through the complete cross-account setup and provide the Spark configuration in a Python notebook.
Introducing HAQM Q Developer in HAQM OpenSearch Service
today we introduced HAQM Q Developer support in OpenSearch Service. With this AI-assisted analysis, both new and experienced users can navigate complex operational data without training, analyze issues, and gain insights in a fraction of the time. In this post, we share how to get started using HAQM Q Developer in OpenSearch Service and explore some of its key capabilities.
Accelerate lightweight analytics using PyIceberg with AWS Lambda and an AWS Glue Iceberg REST endpoint
In this post, we demonstrate how PyIceberg, integrated with the AWS Glue Data Catalog and AWS Lambda, provides a lightweight approach to harness Iceberg’s powerful features through intuitive Python interfaces. We show how this integration enables teams to start working with Iceberg tables with minimal setup and infrastructure dependencies.
Save big on OpenSearch: Unleashing Intel AVX-512 for binary vector performance
With OpenSearch version 2.19, HAQM OpenSearch Service now supports hardware-accelerated enhanced latency and throughput for binary vectors. In this post, we discuss the improvements these advanced processors provide to your OpenSearch workloads, and how it can help you lower your total cost of ownership (TCO).
Automate replication of row-level security from AWS Lake Formation to HAQM QuickSight
This post outlines a solution to automatically replicate the entitlements for readers from the source (AWS Lake Formation) to HAQM QuickSight. This solution can be used even when the authentication method in HAQM QuickSight is not using IAM Identity Center and can work with both direct query and SPICE datasets in HAQM QuickSight.
HAQM OpenSearch Service launches flow builder to empower rapid AI search innovation
The AI search flow builder is available in all AWS Regions that support OpenSearch 2.19+ on OpenSearch Service. In this post, we walk through a couple of scenarios to demonstrate the flow builder. First, we’ll enable semantic search on your old keyword-based OpenSearch application without client-side code changes. Next, we’ll create a multi-modal RAG flow, to showcase how you can redefine image discovery within your applications.
Build end-to-end Apache Spark pipelines with HAQM MWAA, Batch Processing Gateway, and HAQM EMR on EKS clusters
This post shows how to enhance the multi-cluster solution by integrating HAQM Managed Workflows for Apache Airflow (HAQM MWAA) with BPG. By using HAQM MWAA, we add job scheduling and orchestration capabilities, enabling you to build a comprehensive end-to-end Spark-based data processing pipeline.