AWS Big Data Blog
Category: HAQM Simple Storage Service (S3)
Accelerate your analytics with HAQM S3 Tables and HAQM SageMaker Lakehouse
HAQM SageMaker Lakehouse is a unified, open, and secure data lakehouse that now seamlessly integrates with HAQM S3 Tables, the first cloud object store with built-in Apache Iceberg support. In this post, we guide you how to use various analytics services using the integration of SageMaker Lakehouse with S3 Tables.
Build unified pipelines spanning multiple AWS accounts and Regions with HAQM MWAA
In this blog post, we demonstrate how to use HAQM MWAA for centralized orchestration, while distributing data processing and machine learning tasks across different AWS accounts and Regions for optimal performance and compliance.
Using HAQM S3 Tables with HAQM Redshift to query Apache Iceberg tables
In this post, we demonstrate how to get started with S3 Tables and HAQM Redshift Serverless for querying data in Iceberg tables. We show how to set up S3 Tables, load data, register them in the unified data lake catalog, set up basic access controls in SageMaker Lakehouse through AWS Lake Formation, and query the data using HAQM Redshift.
How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions
At Open Universities Australia (OUA), we empower students to explore a vast array of degrees from renowned Australian universities, all delivered through online learning. In this post, we show you how we used AWS services to replace our existing third-party ETL tool, improving the team’s productivity and producing a significant reduction in our ETL operational costs.
Hybrid big data analytics with HAQM EMR on AWS Outposts
In this post, we dive into the transformative features of EMR on Outposts, showcasing its flexibility as a native hybrid data analytics service that allows seamless data access and processing both on premises and in the cloud.
How MuleSoft achieved cloud excellence through an event-driven HAQM Redshift lakehouse architecture
In our previous thought leadership blog post Why a Cloud Operating Model we defined a COE Framework and showed why MuleSoft implemented it and the benefits they received from it. In this post, we’ll dive into the technical implementation describing how MuleSoft used HAQM EventBridge, HAQM Redshift, HAQM Redshift Spectrum, HAQM S3, & AWS Glue to implement it.
Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction
In this post, we explore new features of the AWS Glue Data Catalog, which now supports improved automatic compaction of Iceberg tables for streaming data, making it straightforward for you to keep your transactional data lakes consistently performant. Enabling automatic compaction on Iceberg tables reduces metadata overhead on your Iceberg tables and improves query performance
Building end-to-end data lineage for one-time and complex queries using HAQM Athena, HAQM Redshift, HAQM Neptune and dbt
In this post, we use dbt for data modeling on both HAQM Athena and HAQM Redshift. dbt on Athena supports real-time queries, while dbt on HAQM Redshift handles complex queries, unifying the development language and significantly reducing the technical learning curve. Using a single dbt modeling language not only simplifies the development process but also automatically generates consistent data lineage information. This approach offers robust adaptability, easily accommodating changes in data structures.
Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark
In this post, we will explore how to harness the power of Open source Apache Spark and configure a third-party engine to work with AWS Glue Iceberg REST Catalog. The post will include details on how to perform read/write data operations against HAQM S3 tables with AWS Lake Formation managing metadata and underlying data access using temporary credential vending.
How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes
ANZ Institutional Division has transformed its data management approach by implementing a federated data platform based on data mesh principles. This shift aims to unlock untapped data potential, improve operational efficiency, and increase agility. The new strategy empowers domain teams to create and manage their own data products, treating data as a valuable asset rather than a byproduct. This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division.