AWS Big Data Blog

Enforce fine-grained access control on Open Table Formats via HAQM EMR integrated with AWS Lake Formation

With HAQM EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. This allows you to simplify security and governance over transactional data lakes by providing access controls at table-, column-, and row-level permissions with your Apache Spark jobs. Many […]

Power neural search with AI/ML connectors in HAQM OpenSearch Service

With the launch of the neural search feature for HAQM OpenSearch Service in OpenSearch 2.9, it’s now effortless to integrate with AI/ML models to power semantic search and other use cases. OpenSearch Service has supported both lexical and vector search since the introduction of its k-nearest neighbor (k-NN) feature in 2020; however, configuring semantic search […]

Backup and Restore - Pre

Disaster recovery strategies for HAQM MWAA – Part 1

In the dynamic world of cloud computing, ensuring the resilience and availability of critical applications is paramount. Disaster recovery (DR) is the process by which an organization anticipates and addresses technology-related disasters. For organizations implementing critical workload orchestration using HAQM Managed Workflows for Apache Airflow (HAQM MWAA), it is crucial to have a DR plan […]

os_glue_architecture

Detect, mask, and redact PII data using AWS Glue before loading into HAQM OpenSearch Service

Many organizations, small and large, are working to migrate and modernize their analytics workloads on HAQM Web Services (AWS). There are many reasons for customers to migrate to AWS, but one of the main reasons is the ability to use fully managed services rather than spending time maintaining infrastructure, patching, monitoring, backups, and more. Leadership […]

Enable metric-based and scheduled scaling for HAQM Managed Service for Apache Flink

Thousands of developers use Apache Flink to build streaming applications to transform and analyze data in real time. Apache Flink is an open source framework and engine for processing data streams. It’s highly available and scalable, delivering high throughput and low latency for the most demanding stream-processing applications. Monitoring and scaling your applications is critical […]

Achieve high availability in HAQM OpenSearch Multi-AZ with Standby enabled domains: A deep dive into failovers

HAQM OpenSearch Service recently introduced Multi-AZ with Standby, a deployment option designed to provide businesses with enhanced availability and consistent performance for critical workloads. With this feature, managed clusters can achieve 99.99% availability while remaining resilient to zonal infrastructure failures. In this post, we explore how search and indexing works with Multi-AZ with Standby and […]

HAQM OpenSearch Service search enhancements: 2023 roundup

What users expect from search engines has evolved over the years. Just returning lexically relevant results quickly is no longer enough for most users. Now users seek methods that allow them to get even more relevant results through semantic understanding or even search through image visual similarities instead of textual search of metadata. HAQM OpenSearch […]

Architectural patterns for real-time analytics using HAQM Kinesis Data Streams, part 1

We’re living in the age of real-time data and insights, driven by low-latency data streaming applications. Today, everyone expects a personalized experience in any application, and organizations are constantly innovating to increase their speed of business operation and decision making. The volume of time-sensitive data produced is increasing rapidly, with different formats of data being […]

HAQM OpenSearch Serverless now supports automated time-based data deletion 

We recently announced a new enhancement to OpenSearch Serverless for managing data retention of Time Series collections and Indexes. OpenSearch Serverless for HAQM OpenSearch Service makes it straightforward to run search and analytics workloads without having to think about infrastructure management. With the new automated time-based data deletion feature, you can specify how long they want […]

Solution design diagram

Run Kinesis Agent on HAQM ECS

February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. Kinesis Agent is a standalone Java software application that offers a straightforward way to collect and send data to HAQM Kinesis Data Streams and HAQM Kinesis Data Firehose. The agent continuously monitors […]