AWS Big Data Blog

Category: HAQM Athena

Expanding data analysis and visualization options: HAQM DataZone now integrates with Tableau, Power BI, and more

HAQM DataZone now launched authentication support through the  HAQM Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. This integration empowers data users to access and analyze governed data within HAQM DataZone using familiar tools, boosting both productivity and flexibility.

Analyze HAQM EMR on HAQM EC2 cluster usage with HAQM Athena and HAQM QuickSight

In this post, we guide you through deploying a comprehensive solution in your HAQM Web Services (AWS) environment to analyze HAQM EMR on EC2 cluster usage. By using this solution, you will gain a deep understanding of resource consumption and associated costs of individual applications running on your EMR cluster.

High level architecture of the Estimations system using Athena

How AppsFlyer modernized their interactive workload by moving to HAQM Athena and saved 80% of costs

AppsFlyer develops a leading measurement solution focused on privacy, which enables marketers to gauge the effectiveness of their marketing activities and integrates them with the broader marketing world, managing a vast volume of 100 billion events every day. This post explores how AppsFlyer modernized their Audiences Segmentation product by using HAQM Athena.

Query AWS Glue Data Catalog views using HAQM Athena and HAQM Redshift

Glue Data Catalog views is a new feature of the AWS Glue Data Catalog that customers can use to create a common view schema and single metadata container that can hold view-definitions in different dialects that can be used across engines such as HAQM Redshift and HAQM Athena. In this blog post, we will show how you can define and query a Data Catalog view on top of open source table formats such as Iceberg across Athena and HAQM Redshift. We will also show you the configurations needed to restrict access to the underlying database and tables. To follow along, we have provided an AWS CloudFormation template.

Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and HAQM MSK

The post illustrates the construction of a comprehensive CDC system, enabling the processing of CDC data sourced from HAQM Relational Database Service (HAQM RDS) for MySQL. Initially, we’re creating a raw data lake of all modified records in the database in near real time using HAQM MSK and writing to HAQM S3 as raw data. Later, we use an AWS Glue exchange, transform, and load (ETL) job for batch processing of CDC data from the S3 raw data lake.

Flow of logs from source to destination. All logs are sent to Cribl which routes portions of logs to the SIEM, portions to HAQM OpenSearch, and copies of logs to HAQM S3.

How Zurich Insurance Group built a log management solution on AWS

This post is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance Group. The growth in volume and number of logging sources has been increasing exponentially over the last few years, and will continue to increase in the coming years. As a result, customers across all industries are facing multiple […]

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

This post is co-written with Amit Gilad, Alex Dickman and Itay Takersman from Cloudinary.  Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. Data-driven decisions lead to more effective responses to unexpected events, increase innovation and allow […]

Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and HAQM S3 Access Grants

Many organizations use external identity providers (IdPs) such as Okta or Microsoft Azure Active Directory to manage their enterprise user identities. These users interact with and run analytical queries across AWS analytics services. To enable them to use the AWS services, their identities from the external IdP are mapped to AWS Identity and Access Management […]

Use AWS Data Exchange to seamlessly share Apache Hudi datasets

Apache Hudi was originally developed by Uber in 2016 to bring to life a transactional data lake that could quickly and reliably absorb updates to support the massive growth of the company’s ride-sharing platform. Apache Hudi is now widely used to build very large-scale data lakes by many across the industry. Today, Hudi is the […]