AWS Big Data Blog

Category: Intermediate (200)

How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using HAQM DataZone

This second post of a two-part series that details how Volkswagen Autoeuropa, a Volkswagen Group plant, together with AWS, built a data solution with a robust governance framework using HAQM DataZone to become a data-driven factory. Part 1 of this series focused on the customer challenges, overall solution architecture and solution features, and how they helped Volkswagen Autoeuropa overcome their challenges. This post dives into the technical details, highlighting the robust data governance framework that enables ease of access to quality data using HAQM DataZone.

Use HAQM Kinesis Data Streams to deliver real-time data to HAQM OpenSearch Service domains with HAQM OpenSearch Ingestion

In this post, we show how to use HAQM Kinesis Data Streams to buffer and aggregate real-time streaming data for delivery into HAQM OpenSearch Service domains and collections using HAQM OpenSearch Ingestion. You can use this approach for a variety of use cases, from real-time log analytics to integrating application messaging data for real-time search. In this post, we focus on the use case for centralizing log aggregation for an organization that has a compliance need to archive and retain its log data.

Achieve data resilience using HAQM OpenSearch Service disaster recovery with snapshot and restore

This post focuses on introducing an active-passive approach using a snapshot and restore strategy. The snapshot and restore strategy in OpenSearch Service involves creating point-in-time backups, known as snapshots, of your OpenSearch domain. These snapshots capture the entire state of the domain, including indexes, mappings, and settings. In the event of data loss or system failure, these snapshots will be used to restore the domain to a specific point in time. The post walks through the steps to set up this disaster recovery solution, including launching OpenSearch Service domains in primary and secondary regions, configuring snapshot repositories, restoring snapshots, and failing over/failing back between the regions.

Incremental refresh for HAQM Redshift materialized views on data lake tables

HAQM Redshift now provides the ability to incrementally refresh your materialized views on data lake tables including open file and table formats such as Apache Iceberg. In this post, we will show you step-by-step what operations are supported on both open file formats and transactional data lake tables to enable incremental refresh of the materialized view.

Fine-grained access control in HAQM EMR Serverless with AWS Lake Formation

In this post, we discuss how to implement fine-grained access control in EMR Serverless using Lake Formation. With this integration, organizations can achieve better scalability, flexibility, and cost-efficiency in their data operations, ultimately driving more value from their data assets.

How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using HAQM DataZone

In this post, we discuss how Volkswagen Autoeuropa used HAQM DataZone to build a data marketplace based on data mesh architecture to accelerate their digital transformation. The data mesh, built on HAQM DataZone, simplified data access, improved data quality, and established governance at scale to power analytics, reporting, AI, and machine learning (ML) use cases. As a result, the data solution offers benefits such as faster access to data, expeditious decision making, accelerated time to value for use cases, and enhanced data governance.

Expanding data analysis and visualization options: HAQM DataZone now integrates with Tableau, Power BI, and more

HAQM DataZone now launched authentication support through the  HAQM Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. This integration empowers data users to access and analyze governed data within HAQM DataZone using familiar tools, boosting both productivity and flexibility.

Control your AWS Glue Studio development interface with AWS Glue job mode API property

The AWS Glue Jobs API is a robust interface that allows data engineers and developers to programmatically manage and run ETL jobs. To improve customer experience with the AWS Glue Jobs API, we added a new property describing the job mode corresponding to script, visual, or notebook. In this post, we explore how the updated AWS Glue Jobs API works in depth and demonstrate the new experience with the updated API.

Achieve the best price-performance in HAQM Redshift with elastic histograms for selectivity estimation

HAQM Redshift now offers enhanced query performance with optimizations such as Enhanced Histograms for Selectivity Estimation in the absence of fresh statistics by relying on metadata statistics gathered during ingestion. In this post, we cover new performance optimizations in Redshift data warehouse query processing and how elastic histogram statistics help enhance selectivity estimation and the overall quality of query plans for HAQM Redshift data warehouse queries in the absence of fresh table statistics.

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Adoption of data lakes and the data mesh framework emerges as a powerful approach. By decentralizing data ownership and distribution, enterprises can break down silos and enable seamless data sharing. In this post, we discuss how to choose the right tool for building an enterprise data platform and enabling data sharing, collaboration and access within your organization and with third-party providers. We address three business use cases using AWS Glue, AWS Data Exchange, AWS Clean Rooms, and HAQM DataZone through three different use cases.