AWS Big Data Blog
Category: Analytics
How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using HAQM DataZone
In this post, we discuss how Volkswagen Autoeuropa used HAQM DataZone to build a data marketplace based on data mesh architecture to accelerate their digital transformation. The data mesh, built on HAQM DataZone, simplified data access, improved data quality, and established governance at scale to power analytics, reporting, AI, and machine learning (ML) use cases. As a result, the data solution offers benefits such as faster access to data, expeditious decision making, accelerated time to value for use cases, and enhanced data governance.
Streamline AI-driven analytics with governance: Integrating Tableau with HAQM DataZone
HAQM DataZone recently announced the expansion of data analysis and visualization options for your project-subscribed data within HAQM DataZone using the HAQM Athena JDBC driver. In this post, you learn how the recent enhancements in HAQM DataZone facilitate a seamless connection with Tableau. By integrating Tableau with the comprehensive data governance capabilities of HAQM DataZone, we’re empowering data consumers to quickly and seamlessly explore and analyze their governed data.
Expanding data analysis and visualization options: HAQM DataZone now integrates with Tableau, Power BI, and more
HAQM DataZone now launched authentication support through the HAQM Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. This integration empowers data users to access and analyze governed data within HAQM DataZone using familiar tools, boosting both productivity and flexibility.
Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer
This is the final part of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to process data with HAQM Redshift Spectrum and create the gold (consumption) layer.
Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg
This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake (Apache Iceberg) using AWS Glue. We show how to build data pipelines using AWS Glue jobs, optimize them for both cost and performance, and implement schema evolution to automate manual tasks. To review the first part of the series, where we load SQL Server data into HAQM Simple Storage Service (HAQM S3) using AWS Database Migration Service (AWS DMS), see Modernize your legacy databases with AWS data lakes, Part 1: Migrate SQL Server using AWS DMS.
Improve OpenSearch Service cluster resiliency and performance with dedicated coordinator nodes
Today, we are announcing dedicated coordinator nodes for HAQM OpenSearch Service domains deployed on managed clusters. When you use HAQM OpenSearch Service to create OpenSearch domains, the data nodes serve dual roles of coordinating data-related requests like indexing requests, and search requests, and of doing the work of processing the requests – indexing documents and […]
Control your AWS Glue Studio development interface with AWS Glue job mode API property
The AWS Glue Jobs API is a robust interface that allows data engineers and developers to programmatically manage and run ETL jobs. To improve customer experience with the AWS Glue Jobs API, we added a new property describing the job mode corresponding to script, visual, or notebook. In this post, we explore how the updated AWS Glue Jobs API works in depth and demonstrate the new experience with the updated API.
How BMW streamlined data access using AWS Lake Formation fine-grained access control
This post explores how BMW implemented AWS Lake Formation’s fine-grained access control (FGAC) in the Cloud Data Hub and how this saves them up to 25% on compute and storage costs. By using AWS Lake Formation fine-grained access control capabilities, BMW has transparently implemented finer data access management within the Cloud Data Hub. The integration of Lake Formation has enabled data stewards to scope and grant granular access to specific subsets of data, reducing costly data duplication.
Analyze HAQM EMR on HAQM EC2 cluster usage with HAQM Athena and HAQM QuickSight
In this post, we guide you through deploying a comprehensive solution in your HAQM Web Services (AWS) environment to analyze HAQM EMR on EC2 cluster usage. By using this solution, you will gain a deep understanding of resource consumption and associated costs of individual applications running on your EMR cluster.
Achieve the best price-performance in HAQM Redshift with elastic histograms for selectivity estimation
HAQM Redshift now offers enhanced query performance with optimizations such as Enhanced Histograms for Selectivity Estimation in the absence of fresh statistics by relying on metadata statistics gathered during ingestion. In this post, we cover new performance optimizations in Redshift data warehouse query processing and how elastic histogram statistics help enhance selectivity estimation and the overall quality of query plans for HAQM Redshift data warehouse queries in the absence of fresh table statistics.