AWS Big Data Blog
Category: HAQM Athena
Expanding data analysis and visualization options: HAQM DataZone now integrates with Tableau, Power BI, and more
HAQM DataZone now launched authentication support through the HAQM Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. This integration empowers data users to access and analyze governed data within HAQM DataZone using familiar tools, boosting both productivity and flexibility.
Analyze HAQM EMR on HAQM EC2 cluster usage with HAQM Athena and HAQM QuickSight
In this post, we guide you through deploying a comprehensive solution in your HAQM Web Services (AWS) environment to analyze HAQM EMR on EC2 cluster usage. By using this solution, you will gain a deep understanding of resource consumption and associated costs of individual applications running on your EMR cluster.
Enriching metadata for accurate text-to-SQL generation for HAQM Athena
In this post, we demonstrate the critical role of metadata in text-to-SQL generation through an example implemented for HAQM Athena using HAQM Bedrock. We discuss the challenges in maintaining the metadata as well as ways to overcome those challenges and enrich the metadata.
How AppsFlyer modernized their interactive workload by moving to HAQM Athena and saved 80% of costs
AppsFlyer develops a leading measurement solution focused on privacy, which enables marketers to gauge the effectiveness of their marketing activities and integrates them with the broader marketing world, managing a vast volume of 100 billion events every day. This post explores how AppsFlyer modernized their Audiences Segmentation product by using HAQM Athena.
Query AWS Glue Data Catalog views using HAQM Athena and HAQM Redshift
Glue Data Catalog views is a new feature of the AWS Glue Data Catalog that customers can use to create a common view schema and single metadata container that can hold view-definitions in different dialects that can be used across engines such as HAQM Redshift and HAQM Athena. In this blog post, we will show how you can define and query a Data Catalog view on top of open source table formats such as Iceberg across Athena and HAQM Redshift. We will also show you the configurations needed to restrict access to the underlying database and tables. To follow along, we have provided an AWS CloudFormation template.
Synchronize data lakes with CDC-based UPSERT using open table format, AWS Glue, and HAQM MSK
The post illustrates the construction of a comprehensive CDC system, enabling the processing of CDC data sourced from HAQM Relational Database Service (HAQM RDS) for MySQL. Initially, we’re creating a raw data lake of all modified records in the database in near real time using HAQM MSK and writing to HAQM S3 as raw data. Later, we use an AWS Glue exchange, transform, and load (ETL) job for batch processing of CDC data from the S3 raw data lake.
How Zurich Insurance Group built a log management solution on AWS
This post is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance Group. The growth in volume and number of logging sources has been increasing exponentially over the last few years, and will continue to increase in the coming years. As a result, customers across all industries are facing multiple […]
How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics
This post is co-written with Amit Gilad, Alex Dickman and Itay Takersman from Cloudinary. Enterprises and organizations across the globe want to harness the power of data to make better decisions by putting data at the center of every decision-making process. Data-driven decisions lead to more effective responses to unexpected events, increase innovation and allow […]
Simplify data lake access control for your enterprise users with trusted identity propagation in AWS IAM Identity Center, AWS Lake Formation, and HAQM S3 Access Grants
Many organizations use external identity providers (IdPs) such as Okta or Microsoft Azure Active Directory to manage their enterprise user identities. These users interact with and run analytical queries across AWS analytics services. To enable them to use the AWS services, their identities from the external IdP are mapped to AWS Identity and Access Management […]
Use AWS Data Exchange to seamlessly share Apache Hudi datasets
Apache Hudi was originally developed by Uber in 2016 to bring to life a transactional data lake that could quickly and reliably absorb updates to support the massive growth of the company’s ride-sharing platform. Apache Hudi is now widely used to build very large-scale data lakes by many across the industry. Today, Hudi is the […]