AWS Big Data Blog

Tag: data governance

Connect, share, and query where your data sits using HAQM SageMaker Unified Studio

In this blog post, we will demonstrate how business units can use HAQM SageMaker Unified Studio to discover, subscribe to, and analyze these distributed data assets. Through this unified query capability, you can create comprehensive insights into customer transaction patterns and purchase behavior for active products without the traditional barriers of data silos or the need to copy data between systems.

How EUROGATE established a data mesh architecture using HAQM DataZone

In this post, we show you how EUROGATE uses AWS services, including HAQM DataZone, to make data discoverable by data consumers across different business units so that they can innovate faster. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as HAQM Redshift and HAQM SageMaker.

hubandspoke

Seamless integration of data lake and data warehouse using HAQM Redshift Spectrum and HAQM DataZone

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. A data mesh framework empowers business units with data ownership and facilitates seamless sharing. However, integrating datasets from different business units can present several […]

DataZone High Level Architecture

Implement data quality checks on HAQM Redshift data assets and integrate with HAQM DataZone

In this post, we show how to capture the data quality metrics for data assets produced in HAQM Redshift. With HAQM DataZone, the data owner can directly import the technical metadata of a Redshift database table and views to the HAQM DataZone project’s inventory. As these data assets gets imported into HAQM DataZone, it bypasses the AWS Glue Data Catalog, creating a gap in data quality integration. This post proposes a solution to enrich the HAQM Redshift data asset with data quality scores and KPI metrics.

Streamline your data governance by deploying HAQM DataZone with the AWS CDK

Managing data across diverse environments can be a complex and daunting task. HAQM DataZone simplifies this so you can catalog, discover, share, and govern data stored across AWS, on premises, and third-party sources. Many organizations manage vast amounts of data assets owned by various teams, creating a complex landscape that poses challenges for scalable data […]

Deploy DataHub using AWS managed services and ingest metadata from AWS Glue and HAQM Redshift – Part 2

In the first post of this series, we discussed the need of a metadata management solution for organizations. We used DataHub as an open-source metadata platform for metadata management and deployed it using AWS managed services with the AWS Cloud Development Kit (AWS CDK). In this post, we focus on how to populate technical metadata […]

Deploy DataHub using AWS managed services and ingest metadata from AWS Glue and HAQM Redshift – Part 1

Many organizations are establishing enterprise data warehouses, data lakes, or a modern data architecture on AWS to build data-driven products. As the organization grows, the number of publishers and subscribers to data and the volume of data keeps increasing. Additionally, different varieties of datasets are introduced (structured, semistructured, and unstructured). This can lead to metadata […]