AWS Big Data Blog
Category: Compute
Build a blockchain analytic solution with AWS Lambda, HAQM Kinesis, and HAQM Athena
In this post, we’ll show you how to deploy an Ethereum blockchain using the AWS Blockchain Templates, deploy a smart contract, and build a serverless analytics pipeline for that contract based around AWS Lambda, HAQM Kinesis, and HAQM Athena.
Analyze HAQM Connect records with HAQM Athena, AWS Glue, and HAQM QuickSight
In this blog post, we focus on how to get analytics out of the rich set of data published by HAQM Connect. We make use of an HAQM Connect data stream and create an end-to-end workflow to offer an analytical solution that can be customized based on need.
Orchestrate Apache Spark applications using AWS Step Functions and Apache Livy
In this post, I’ll show you how to use AWS Step Functions to orchestrate your Spark jobs that are running on HAQM EMR.
How to retain system tables’ data spanning multiple HAQM Redshift clusters and run cross-cluster diagnostic queries
In this blog post, I present a solution that exports system tables from multiple HAQM Redshift clusters into an HAQM S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the HAQM S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.
Create data science environments on AWS for health analysis using OHDSI
This blog post demonstrates how to combine some of the OHDSI projects (Atlas, Achilles, WebAPI, and the OMOP Common Data Model) with AWS technologies. By doing so, you can quickly and inexpensively implement a health data science and informatics environment.
Power from wind: Open data on AWS
Data that describe processes in a spatial context are everywhere in our day-to-day lives and they dominate big data problems. Map data, for instance, whether describing networks of roads or remote sensing data from satellites, get us where we need to go. Atmospheric data from simulations and sensors underlie our weather forecasts and climate models. […]
Best Practices for Running Apache Cassandra on HAQM EC2
In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case.
Dynamically Create Friendly URLs for Your HAQM EMR Web Interfaces
This solution provides a serverless approach to automatically assigning a friendly name for your EMR cluster for easy access to popular notebooks and other web interfaces.
Preprocessing Data in HAQM Kinesis Analytics with AWS Lambda
Kinesis Analytics now gives you the option to preprocess your data with AWS Lambda. This gives you a great deal of flexibility in defining what data gets analyzed by your Kinesis Analytics application. In this post, I discuss some common use cases for preprocessing, and walk you through an example to help highlight its applicability.
Implement Continuous Integration and Delivery of Apache Spark Applications using AWS
In this post, we walk you through a solution that implements a continuous integration and deployment pipeline supported by AWS services. You can use the sample template and Spark application shared in this post and adapt them for the specific needs of your own application.