AWS Storage Blog

Category: Analytics

HAQM S3 Tables

Streamlining access to tabular datasets stored in HAQM S3 Tables with DuckDB

As businesses continue to rely on data-driven decision-making, there’s an increasing demand for tools that streamline and accelerate the process of data analysis. Efficiency and simplicity in application architecture can serve as a competitive edge when driving high-stakes decisions. Developers are seeking lightweight, flexible tools that seamlessly integrate with their existing application stack, specifically solutions […]

HAQM S3 Tables

Seamless streaming to HAQM S3 Tables with StreamNative Ursa Engine

Organizations are modernizing data platforms to use generative AI by centralizing data from various sources and streaming real-time data into data lakes. A strong data foundation, such as scalable storage, reliable ingestion pipelines, and interoperable formats, is critical for businesses to discover, explore, and consume data. As organizations modernize their platforms, they often turn to […]

HAQM S3 Tables

Connect Snowflake to S3 Tables using the SageMaker Lakehouse Iceberg REST endpoint

Organizations today seek data analytics solutions that provide maximum flexibility and accessibility. Customers need their data to be readily available using their preferred query engines, and break down barriers across different computing environments. At the same time, they want a single copy of data to be used across these solutions, to track lineage, be cost […]

HAQM S3 Tables

Build a managed Apache Iceberg data lake using Starburst and HAQM S3 Tables

Managing large-scale data analytics across diverse data sources has long been a challenge for enterprises. Data teams often struggle with complex data lake configurations, performance bottlenecks, and the need to maintain consistent data governance while enabling broad access to analytics capabilities. Today, Starburst announces a powerful solution to these challenges by extending their Apache Iceberg […]

HAQM S3 featured image 2023

Build a data lake for streaming data with HAQM S3 Tables and HAQM Data Firehose

Businesses are increasingly adopting real-time data processing to stay ahead of user expectations and market changes. Industries such as retail, finance, manufacturing, and smart cities are using streaming data for everything from optimizing supply chains to detecting fraud and improving urban planning. The ability to use data as it is generated has become a critical […]

HAQM S3 Tables

Access data in HAQM S3 Tables using PyIceberg through the AWS Glue Iceberg REST endpoint

Modern data lakes integrate with multiple engines to meet a wide range of analytics needs, from SQL querying to stream processing. A key enabler of this approach is the adoption of Apache Iceberg as the open table format for building transactional data lakes. However, as the Iceberg ecosystem expands, the growing variety of engines and languages has […]

HAQM S3 Metadata thumbnail image

Integrating custom metadata with HAQM S3 Metadata

Organizations of all sizes face a common challenge: efficiently managing, organizing, and retrieving vast amounts of digital content. From images and videos to documents and application data, businesses are inundated with information that needs to be stored securely, accessed quickly, and analyzed effectively. The ability to extract, manage, and use metadata from this content is […]

HAQM S3 Metadata thumbnail image

Analyzing HAQM S3 Metadata with HAQM Athena and HAQM QuickSight

UPDATE (1/27/2025): HAQM S3 Metadata is generally available. Object storage provides virtually unlimited scalability, but managing billions, or even trillions, of objects can pose significant challenges. How do you know what data you have? How can you find the right datasets at the right time? By implementing a robust metadata management strategy, you can answer these […]

HAQM S3 Tables

Build a managed transactional data lake with HAQM S3 Tables

UPDATE (12/19/2024): Added guidance for HAQM EMR setup. Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it […]

HAQM S3 featured image 2023

How HAQM Ads uses Iceberg optimizations to accelerate their Spark workload on HAQM S3

In today’s data-driven business landscape, organizations are increasingly relying on massive data lakes to store, process, and analyze vast amounts of information. However, as these data repositories grow to petabyte scale, a key challenge for businesses is implementing transactional capabilities on their data lakes efficiently. The sheer volume of data requires immense computational power and […]