AWS Big Data Blog
Category: Analytics
Integrate custom applications with AWS Lake Formation – Part 1
In this two-part series, we show how to integrate custom applications or data processing engines with Lake Formation using the third-party services integration feature. In this post, we dive deep into the required Lake Formation and AWS Glue APIs. We walk through the steps to enforce Lake Formation policies within custom data applications. As an example, we present a sample Lake Formation integrated application implemented using AWS Lambda.
Integrate custom applications with AWS Lake Formation – Part 2
In this two-part series, we show how to integrate custom applications or data processing engines with Lake Formation using the third-party services integration feature. In this post, we explore how to deploy a fully functional web client application, built with JavaScript/React through AWS Amplify (Gen 1), that uses the same Lambda function as the backend. The provisioned web application provides a user-friendly and intuitive way to view the Lake Formation policies that have been enforced.
Manage access controls in generative AI-powered search applications using HAQM OpenSearch Service and HAQM Cognito
In this post, we show you how to manage user access to enterprise documents in generative AI-powered tools according to the access you assign to each persona. This post illustrates how to build a document search RAG solution that makes sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. It combines OpenSearch Service and HAQM Cognito custom attributes to make a tag-based access control mechanism that makes it straightforward to manage at scale.
Enrich your AWS Glue Data Catalog with generative AI metadata using HAQM Bedrock
By harnessing the capabilities of generative AI, you can automate the generation of comprehensive metadata descriptions for your data assets based on their documentation, enhancing discoverability, understanding, and the overall data governance within your AWS Cloud environment. This post shows you how to enrich your AWS Glue Data Catalog with dynamic metadata using foundation models (FMs) on HAQM Bedrock and your data documentation.
Ingest telemetry messages in near real time with HAQM API Gateway, HAQM Data Firehose, and HAQM Location Service
These organizations use third-party satellite-powered terminal devices for remote monitoring using telemetry and NMEA-0183 formatted messages generated in near real time. This post demonstrates how to implement a satellite-based remote alerting and response solution on the AWS Cloud to provide time-critical alerts and actionable insights, with a focus on telemetry message ingestion and alerts. Key services in the solution include HAQM API Gateway, HAQM Data Firehose, and HAQM Location Service.
Expand data access through Apache Iceberg using Delta Lake UniForm on AWS
Delta Lake UniForm is an open table format extension designed to provide a universal data representation that can be efficiently read by different processing engines. It aims to bridge the gap between various data formats and processing systems, offering a standardized approach to data storage and retrieval. With UniForm, you can read Delta Lake tables as Apache Iceberg tables. This post explores how to start using Delta Lake UniForm on HAQM Web Services (AWS).
How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using HAQM DataZone
This second post of a two-part series that details how Volkswagen Autoeuropa, a Volkswagen Group plant, together with AWS, built a data solution with a robust governance framework using HAQM DataZone to become a data-driven factory. Part 1 of this series focused on the customer challenges, overall solution architecture and solution features, and how they helped Volkswagen Autoeuropa overcome their challenges. This post dives into the technical details, highlighting the robust data governance framework that enables ease of access to quality data using HAQM DataZone.
Streamlining AWS Glue Studio visual jobs: Building an integrated CI/CD pipeline for seamless environment synchronization
As data engineers increasingly rely on the AWS Glue Studio visual editor to create data integration jobs, the need for a streamlined development lifecycle and seamless synchronization between environments has become paramount. Additionally, managing versions of visual directed acyclic graphs (DAGs) is crucial for tracking changes, collaboration, and maintaining consistency across environments. This post introduces an end-to-end solution that addresses these needs by combining the power of the AWS Glue Visual Job API, a custom AWS Glue Resource Sync Utility, and an based continuous integration and continuous deployment (CI/CD) pipeline.
Use HAQM Kinesis Data Streams to deliver real-time data to HAQM OpenSearch Service domains with HAQM OpenSearch Ingestion
In this post, we show how to use HAQM Kinesis Data Streams to buffer and aggregate real-time streaming data for delivery into HAQM OpenSearch Service domains and collections using HAQM OpenSearch Ingestion. You can use this approach for a variety of use cases, from real-time log analytics to integrating application messaging data for real-time search. In this post, we focus on the use case for centralizing log aggregation for an organization that has a compliance need to archive and retain its log data.
Achieve data resilience using HAQM OpenSearch Service disaster recovery with snapshot and restore
This post focuses on introducing an active-passive approach using a snapshot and restore strategy. The snapshot and restore strategy in OpenSearch Service involves creating point-in-time backups, known as snapshots, of your OpenSearch domain. These snapshots capture the entire state of the domain, including indexes, mappings, and settings. In the event of data loss or system failure, these snapshots will be used to restore the domain to a specific point in time. The post walks through the steps to set up this disaster recovery solution, including launching OpenSearch Service domains in primary and secondary regions, configuring snapshot repositories, restoring snapshots, and failing over/failing back between the regions.