AWS Big Data Blog

Category: Intermediate (200)

Query your Apache Hive metastore with AWS Lake Formation permissions

Apache Hive is a SQL-based data warehouse system for processing highly distributed datasets on the Apache Hadoop platform. There are two key components to Apache Hive: the Hive SQL query engine and the Hive metastore (HMS). The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, […]

Dimensional modeling in HAQM Redshift

HAQM Redshift is a fully managed and petabyte-scale cloud data warehouse that is used by tens of thousands of customers to process exabytes of data every day to power their analytics workload. You can structure your data, measure business processes, and get valuable insights quickly can be done by using a dimensional model. HAQM Redshift […]

Migrate data from Google Cloud Storage to HAQM S3 using AWS Glue

Today, we are pleased to announce a new AWS Glue connector for Google Cloud Storage that allows you to move data bi-directionally between Google Cloud Storage and HAQM Simple Storage Service (HAQM S3). In this post, we go over how the new connector works, introduce the connector’s functions, and provide you with key steps to set it up. We provide you with prerequisites, share how to subscribe to this connector in AWS Marketplace, and describe how to create and run AWS Glue for Apache Spark jobs with it.

Automate secure access to HAQM MWAA environments using existing OpenID Connect single-sign-on authentication and authorization

Customers use HAQM Managed Workflows for Apache Airflow (HAQM MWAA) to run Apache Airflow at scale in the cloud. They want to use their existing login solutions developed using OpenID Connect (OIDC) providers with HAQM MWAA; this allows them to provide a uniform authentication and single sign-on (SSO) experience using their adopted identity providers (IdP) […]

Introducing field-based coloring experience for HAQM QuickSight

Color plays a crucial role in visualizations. It conveys meaning, captures attention, and enhances aesthetics. You can quickly grasp important information when key insights and data points pop with color. However, it’s important to use color judiciously to enhance readability and ensure correct interpretation. Color should also be accessible and consistent to enable users to […]

How HAQM Finance Automation built a data mesh to support distributed data ownership and centralize governance

HAQM Finance Automation (FinAuto) is the tech organization of HAQM Finance Operations (FinOps). Its mission is to enable FinOps to support the growth and expansion of HAQM businesses. It works as a force multiplier through automation and self-service, while providing accurate and on-time payments and collections. FinAuto has a unique position to look across FinOps […]

Configure end-to-end data pipelines with Etleap, HAQM Redshift, and dbt

This blog post is co-written with Zygimantas Koncius from Etleap. Organizations use their data to extract valuable insights and drive informed business decisions. With a wide array of data sources, including transactional databases, log files, and event streams, you need a simple-to-use solution capable of efficiently ingesting and transforming large volumes of data in real […]

Level up your React app with HAQM QuickSight: How to embed your dashboard for anonymous access

Using embedded analytics from HAQM QuickSight can simplify the process of equipping your application with functional visualizations without any complex development. There are multiple ways to embed QuickSight dashboards into application. In this post, we look at how it can be done using React and the HAQM QuickSight Embedding SDK. Dashboard consumers often don’t have […]

Getting started guide for near-real time operational analytics using HAQM Aurora zero-ETL integration with HAQM Redshift

November 2023: This post was reviewed and updated to include the latest enhancements in HAQM Aurora MySQL zero-ETL integration with HAQM Redshift on general availability (GA). HAQM Aurora zero-ETL integration with HAQM Redshift was announced at AWS re:Invent 2022 and is now generally available (GA) for Aurora MySQL 3.05.0 (compatible with MySQL 8.0.32) and higher […]

Architecture Diagram of the Solution

Enforce boundaries on AWS Glue interactive sessions

AWS Glue interactive sessions allow engineers to build, test, and run data preparation and analytics workloads in an interactive notebook. Interactive sessions provide isolated development environments, take care of the underlying compute cluster, and allow for configuration to stop idling resources. Glue interactive sessions provides default recommended configurations, and also allows users to customize the […]