AWS Big Data Blog

Category: Serverless

Manage your data warehouse cost allocations with HAQM Redshift Serverless tagging

HAQM Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) […]

How gaming companies can use HAQM Redshift Serverless to build scalable analytical applications faster and easier

This post provides guidance on how to build scalable analytical solutions for gaming industry use cases using HAQM Redshift Serverless. It covers how to use a conceptual, logical architecture for some of the most popular gaming industry use cases like event analysis, in-game purchase recommendations, measuring player satisfaction, telemetry data analysis, and more. This post […]

Architecture diagram for the Athena WebSocket API. The user connects to the API through API Gateway. API Gateway uses Lambda and DynamoDB to store session data. SQL queries are routed to HAQM Athena and a Step Function polls for query status and returns the results back to the user.

Access HAQM Athena in your applications using the WebSocket API

In this post, we present a solution that can integrate with your front-end application to query data from HAQM S3 using an Athena synchronous API invocation. With this solution, you can add a layer of abstraction to your application on direct Athena API calls and promote the access using the WebSocket API developed with HAQM API Gateway. The query results are returned back to the application as HAQM S3 presigned URLs.

Achieve up to 27% better price-performance for Spark workloads with AWS Graviton2 on HAQM EMR Serverless

HAQM EMR Serverless is a serverless option in HAQM EMR that makes it simple to run applications using open-source analytics frameworks such as Apache Spark and Hive without configuring, managing, or scaling clusters. At AWS re:Invent 2022, we announced support for running serverless Spark and Hive workloads with AWS Graviton2 (Arm64) on HAQM EMR Serverless. […]

HAQM EMR Serverless supports larger worker sizes to run more compute and memory-intensive workloads

HAQM EMR Serverless allows you to run open-source big data frameworks such as Apache Spark and Apache Hive without managing clusters and servers. With EMR Serverless, you can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements. EMR Serverless automatically scales resources up […]

­­Use fuzzy string matching to approximate duplicate records in HAQM Redshift

It’s common to ingest multiple data sources into HAQM Redshift to perform analytics. Often, each data source will have its own processes of creating and maintaining data, which can lead to data quality challenges within and across sources. One challenge you may face when performing analytics is the presence of imperfect duplicate records within the source data. This post presents one possible approach to addressing this challenge in an HAQM Redshift data warehouse using fuzzy matching.

How HAQM Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Every day, HAQM devices process and analyze billions of transactions from global shipping, inventory, capacity, supply, sales, marketing, producers, and customer service teams. This data is used in procuring devices’ inventory to meet HAQM customers’ demands. With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics […]

Serverless logging with HAQM OpenSearch Serverless and HAQM Kinesis Data Firehose

February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. In this post, you will learn how you can use HAQM Kinesis Data Firehose to build a log ingestion pipeline to send VPC flow logs to HAQM OpenSearch Serverless. First, you create […]

HAQM OpenSearch Serverless is now generally available!

We ended 2022 on a high note with the preview release of HAQM OpenSearch Serverless at re:Invent. Today, we are happy to announce the general availability of HAQM OpenSearch Serverless, the serverless option for HAQM OpenSearch Service that makes it easier to run search and analytics workloads without even having to think about infrastructure management. […]

Build a serverless analytics application with HAQM Redshift and HAQM API Gateway

Serverless applications are a modernized way to perform analytics among business departments and engineering teams. Business teams can gain meaningful insights by simplifying their reporting through web applications and distributing it to a broader audience. Use cases can include the following: Dashboarding – A webpage consisting of tables and charts where each component can offer […]