AWS Big Data Blog

Category: Database

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

At Open Universities Australia (OUA), we empower students to explore a vast array of degrees from renowned Australian universities, all delivered through online learning. In this post, we show you how we used AWS services to replace our existing third-party ETL tool, improving the team’s productivity and producing a significant reduction in our ETL operational costs.

Building end-to-end data lineage for one-time and complex queries using HAQM Athena, HAQM Redshift, HAQM Neptune and dbt

In this post, we use dbt for data modeling on both HAQM Athena and HAQM Redshift. dbt on Athena supports real-time queries, while dbt on HAQM Redshift handles complex queries, unifying the development language and significantly reducing the technical learning curve. Using a single dbt modeling language not only simplifies the development process but also automatically generates consistent data lineage information. This approach offers robust adaptability, easily accommodating changes in data structures.

Accelerate SQL code migration from Google BigQuery to HAQM Redshift using BladeBridge

This post explores how you can use BladeBridge, a leading data environment modernization solution, to simplify and accelerate the migration of SQL code from BigQuery to HAQM Redshift. BladeBridge offers a comprehensive suite of tools that automate much of the complex conversion work, allowing organizations to quickly and reliably transition their data analytics capabilities to the scalable HAQM Redshift data warehouse.

Modernize your legacy databases with AWS data lakes, Part 3: Build a data lake processing layer

This is the final part of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to process data with HAQM Redshift Spectrum and create the gold (consumption) layer.

Simplify data ingestion from HAQM S3 to HAQM Redshift using auto-copy

HAQM Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze your data using standard SQL and your existing business intelligence (BI) tools. Tens of thousands of customers today rely on HAQM Redshift to analyze exabytes of data and run complex analytical queries, making it […]

Get started with HAQM DynamoDB zero-ETL integration with HAQM Redshift

We’re excited to announce the general availability (GA) of HAQM DynamoDB zero-ETL integration with HAQM Redshift, which enables you to run high-performance analytics on your DynamoDB data in HAQM Redshift with little to no impact on production workloads running on DynamoDB. As data is written into a DynamoDB table, it’s seamlessly made available in HAQM Redshift, eliminating the need to build and maintain complex data pipelines.

Differentiate generative AI applications with your data using AWS analytics and managed databases

While the potential of generative artificial intelligence (AI) is increasingly under evaluation, organizations are at different stages in defining their generative AI vision. In many organizations, the focus is on large language models (LLMs), and foundation models (FMs) more broadly. This is just the tip of the iceberg, because what enables you to obtain differential […]

How ZS built a clinical knowledge repository for semantic search using HAQM OpenSearch Service and HAQM Neptune

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, […]

Evaluating sample HAQM Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

In this post, we walk you through the process of testing workload isolation architecture using HAQM Redshift Data Sharing and Test Drive utility. We demonstrate how you can use SQL for advanced price performance analysis and compare different workloads on different target Redshift cluster configurations.

High-level architecture overview

Copy and mask PII between HAQM RDS databases using visual ETL jobs in AWS Glue Studio

In this post, I’ll walk you through how to copy data from one HAQM Relational Database Service (HAQM RDS) for PostgreSQL database to another, while scrubbing PII along the way using AWS Glue. You will learn how to prepare a multi-account environment to access the databases from AWS Glue, and how to model an ETL data flow that automatically masks PII as part of the transfer process, so that no sensitive information will be copied to the target database in its original form.