AWS Storage Blog

Category: HAQM EMR

HAQM S3 Tables

How to consume tabular data from HAQM S3 Tables for insights and business reporting

When was the last time you found yourself trying to look at rows and rows of data in a spreadsheet struggling to interpret and draw conclusions? Many analysts and engineers experience the same challenge every day. Whether it’s analyzing sales trends, monitoring operational metrics, or understanding customer behavior, the challenge lies not just in interpreting […]

HAQM S3 Tables

How Pendulum achieves 6x faster processing and 40% cost reduction with HAQM S3 Tables

Pendulum is an AI-powered analytics platform that aggregates and analyzes real-time data from social media, news, and podcasts. Designed to help organizations stay ahead, it enables reputation monitoring, early crisis detection, and influencer activity tracking. Using machine learning (ML) enables Pendulum to surface key insights from multiple channels, providing a comprehensive view of the digital […]

Bucket filled with ice on a table

Bringing more to the table: How HAQM S3 Tables rapidly delivered new capabilities in the first 5 months

HAQM S3 redefined data storage when it launched as the first generally available AWS service in 2006 to deliver highly reliable, durable, secure, low-latency storage with virtually unlimited scale. While designed to deliver simple storage, S3 has proven to be built to handle the explosive growth of data we have seen in the last 19 […]

HAQM S3 Tables

Streamlining access to tabular datasets stored in HAQM S3 Tables with DuckDB

As businesses continue to rely on data-driven decision-making, there’s an increasing demand for tools that streamline and accelerate the process of data analysis. Efficiency and simplicity in application architecture can serve as a competitive edge when driving high-stakes decisions. Developers are seeking lightweight, flexible tools that seamlessly integrate with their existing application stack, specifically solutions […]

HAQM S3 featured image 2023

Build a data lake for streaming data with HAQM S3 Tables and HAQM Data Firehose

Businesses are increasingly adopting real-time data processing to stay ahead of user expectations and market changes. Industries such as retail, finance, manufacturing, and smart cities are using streaming data for everything from optimizing supply chains to detecting fraud and improving urban planning. The ability to use data as it is generated has become a critical […]

HAQM S3 Tables

Build a managed transactional data lake with HAQM S3 Tables

UPDATE (12/19/2024): Added guidance for HAQM EMR setup. Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it […]

HAQM S3 featured image 2023

How HAQM Ads uses Iceberg optimizations to accelerate their Spark workload on HAQM S3

In today’s data-driven business landscape, organizations are increasingly relying on massive data lakes to store, process, and analyze vast amounts of information. However, as these data repositories grow to petabyte scale, a key challenge for businesses is implementing transactional capabilities on their data lakes efficiently. The sheer volume of data requires immense computational power and […]

HAQM S3 featured image 2023

Use generative AI to query your HAQM S3 data lake for insights

Businesses store large volumes of data in their data lakes and rely on this data to extract insights and make important business decisions. However, business stakeholders sometimes lack the technical skills required to run complex queries against their data lakes. Instead, they rely on data scientists or analysts to build reports and dashboards or to […]

HAQM S3 featured image - new

How to enforce HAQM S3 Access Grants with Immuta

HAQM Simple Storage Service (HAQM S3) is the most popular object storage platform for modern data lakes. Organizations today evolved to adopt a lake house architecture that combines the scalability and cost effectiveness of data lakes with the performance and ease-of-use of data warehouses. Likewise, HAQM S3 plays an increasingly important role as the foundational […]

Maximizing price performance for big data workloads using HAQM EBS

Since the emergence of big data over a decade ago, Hadoop ­– an open-source framework that is used to efficiently store and process large datasets – has been crucial in storing, analyzing, and reducing that data to provide value for enterprises. Hadoop lets you store structured, partially structured, or unstructured data of any kind across […]