Compute | AWS Big Data Blog

Build a blockchain analytic solution with AWS Lambda, HAQM Kinesis, and HAQM Athena

In this post, we’ll show you how to deploy an Ethereum blockchain using the AWS Blockchain Templates, deploy a smart contract, and build a serverless analytics pipeline for that contract based around AWS Lambda, HAQM Kinesis, and HAQM Athena.

Analyze HAQM Connect records with HAQM Athena, AWS Glue, and HAQM QuickSight

In this blog post, we focus on how to get analytics out of the rich set of data published by HAQM Connect. We make use of an HAQM Connect data stream and create an end-to-end workflow to offer an analytical solution that can be customized based on need.

Orchestrate Apache Spark applications using AWS Step Functions and Apache Livy

In this post, I’ll show you how to use AWS Step Functions to orchestrate your Spark jobs that are running on HAQM EMR.

How to retain system tables’ data spanning multiple HAQM Redshift clusters and run cross-cluster diagnostic queries

In this blog post, I present a solution that exports system tables from multiple HAQM Redshift clusters into an HAQM S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the HAQM S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

Create data science environments on AWS for health analysis using OHDSI

This blog post demonstrates how to combine some of the OHDSI projects (Atlas, Achilles, WebAPI, and the OMOP Common Data Model) with AWS technologies. By doing so, you can quickly and inexpensively implement a health data science and informatics environment.

Power from wind: Open data on AWS

Data that describe processes in a spatial context are everywhere in our day-to-day lives and they dominate big data problems. Map data, for instance, whether describing networks of roads or remote sensing data from satellites, get us where we need to go. Atmospheric data from simulations and sensors underlie our weather forecasts and climate models. […]

Best Practices for Running Apache Cassandra on HAQM EC2

In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case.

Dynamically Create Friendly URLs for Your HAQM EMR Web Interfaces

This solution provides a serverless approach to automatically assigning a friendly name for your EMR cluster for easy access to popular notebooks and other web interfaces.

Preprocessing Data in HAQM Kinesis Analytics with AWS Lambda

Kinesis Analytics now gives you the option to preprocess your data with AWS Lambda. This gives you a great deal of flexibility in defining what data gets analyzed by your Kinesis Analytics application. In this post, I discuss some common use cases for preprocessing, and walk you through an example to help highlight its applicability.

Implement Continuous Integration and Delivery of Apache Spark Applications using AWS

In this post, we walk you through a solution that implements a continuous integration and deployment pipeline supported by AWS services. You can use the sample template and Spark application shared in this post and adapt them for the specific needs of your own application.

Category: Compute