AWS Big Data Blog

Accelerate data pipeline creation with the new visual interface in HAQM OpenSearch Ingestion

Today, we’re launching a new visual interface for OpenSearch Ingestion that makes it simple to create and manage your data pipelines from the AWS Management Console. With this new feature, you can build pipelines in minutes without writing complex configurations manually. In this post, we walk through how these new features work and how you can use them to accelerate your data ingestion projects.

Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode

In this post, we demonstrate how to use Lake Formation for read access while continuing to use AWS Identity and Access Management (IAM) policy-based permissions for write workloads that update the schema and upsert (insert and update combined) data records into the Iceberg tables.

Accelerate your analytics with HAQM S3 Tables and HAQM SageMaker Lakehouse

HAQM SageMaker Lakehouse is a unified, open, and secure data lakehouse that now seamlessly integrates with HAQM S3 Tables, the first cloud object store with built-in Apache Iceberg support. In this post, we guide you how to use various analytics services using the integration of SageMaker Lakehouse with S3 Tables.

Build unified pipelines spanning multiple AWS accounts and Regions with HAQM MWAA

In this blog post, we demonstrate how to use HAQM MWAA for centralized orchestration, while distributing data processing and machine learning tasks across different AWS accounts and Regions for optimal performance and compliance.

Integrate ThoughtSpot with HAQM Redshift using AWS IAM Identity Center

In this post, we walk you through the process of setting up ThoughtSpot integration with HAQM Redshift using IAM Identity Center authentication. The solution provides a secure, streamlined analytics environment that empowers your team to focus on what matters most: discovering and sharing valuable business insights.

Streamline data discovery with precise technical identifier search in HAQM SageMaker Unified Studio

We’re excited to introduce a new enhancement to the search experience in HAQM SageMaker Catalog, part of the next generation of HAQM SageMaker—exact match search using technical identifiers. In this post, we demonstrate how to streamline data discovery with precise technical identifier search in HAQM SageMaker Unified Studio.

Apache Flink + Prometheus

Process millions of observability events with Apache Flink and write directly to Prometheus

In this post, we explain how the new connector works. We also show how you can manage your Prometheus metrics data cardinality by preprocessing raw data with Flink to build real-time observability with HAQM Managed Service for Prometheus and HAQM Managed Grafana.

Manage concurrent write conflicts in Apache Iceberg on the AWS Glue Data Catalog

This post demonstrates how to implement reliable concurrent write handling mechanisms in Iceberg tables. We will explore Iceberg’s concurrency model, examine common conflict scenarios, and provide practical implementation patterns of both automatic retry mechanisms and situations requiring custom conflict resolution logic for building resilient data pipelines. We will also cover the pattern with automatic compaction through AWS Glue Data Catalog table optimization.

Optimize multimodal search using the TwelveLabs Embed API and HAQM OpenSearch Service

In this blog post, we show you the process of integrating TwelveLabs Embed API with OpenSearch Service to create a multimodal search solution. You’ll learn how to generate rich, contextual embeddings from video content and use OpenSearch Service’s vector database capabilities to enable search functionalities. By the end of this post, you’ll be equipped with the knowledge to implement a system that can transform the way your organization handles and extracts value from video content.