AWS Big Data Blog

Category: Serverless

How to build a front-line concussion monitoring system using AWS IoT and serverless data lakes – Part 2

August 2024: This post was reviewed and updated for accuracy. In part 1 of this series, we demonstrated how to build a data pipeline in support of a data lake. We used key AWS services such as HAQM Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda. In part 2, we discuss […]

How to build a front-line concussion monitoring system using AWS IoT and serverless data lakes – Part 1

In this two-part series, we show you how to build a data pipeline in support of a data lake. We use key AWS services such as HAQM Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda. In part 2, we focus on generating simple inferences from that data that can support RTP parameters.

How Pagely implemented a serverless data lake in AWS to facilitate customer support analytics

In this post, we discuss how Pagely worked with Beyondsoft, an AWS Advanced Consulting Partner, to use ConvergDB, an open-source tool developed by Beyondsoft, to build a DevOps-centric data pipeline. This pipeline uses AWS Glue to transform application logs into optimized tables that can be queried quickly and cost effectively using HAQM Athena.

Analyze Apache Parquet optimized data using HAQM Kinesis Data Firehose, HAQM Athena, and HAQM Redshift

Kinesis Data Firehose can now save data to HAQM S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use HAQM Athena, HAQM Redshift, AWS Glue, HAQM EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Implement continuous integration and delivery of serverless AWS Glue ETL applications using AWS Developer Tools

In this post, I walk you through a solution that implements a CI/CD pipeline for serverless AWS Glue ETL applications supported by AWS Developer Tools (including AWS CodePipeline, AWS CodeCommit, and AWS CodeBuild) and AWS CloudFormation.

How to retain system tables’ data spanning multiple HAQM Redshift clusters and run cross-cluster diagnostic queries

In this blog post, I present a solution that exports system tables from multiple HAQM Redshift clusters into an HAQM S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the HAQM S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.