AWS Big Data Blog

Tag: HAQM EMR

Turbocharge your Apache Hive Queries on HAQM EMR using LLAP

NOTE: Starting from emr-6.0.0 release, Hive LLAP is officially supported as a YARN service. So setting up LLAP using the instructions from this blog post (using a bootstrap action script) is not needed for releases emr-6.0.0 and onward. ——————————- Apache Hive is one of the most popular tools for analyzing large datasets stored in a Hadoop […]

Setting up Read Replica Clusters with HBase on HAQM S3

Many customers have taken advantage of the numerous benefits of running Apache HBase on HAQM S3 for data storage, including lower costs, data durability, and easier scalability. Customers such as FINRA have lowered their costs by 60% by moving to an HBase on S3 architecture along with the numerous operational benefits that come with decoupling […]

Seven Tips for Using S3DistCp on HAQM EMR to Move Data Efficiently Between HDFS and HAQM S3

Although it’s common for HAQM EMR customers to process data directly in HAQM S3, there are occasions where you might want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your HAQM EMR cluster. Additionally, you might have a use case that requires moving large amounts of data between buckets or regions. In these use cases, large datasets are too big for a simple copy operation.

Build a Healthcare Data Warehouse Using HAQM EMR, HAQM Redshift, AWS Lambda, and OMOP

In the healthcare field, data comes in all shapes and sizes. Despite efforts to standardize terminology, some concepts (e.g., blood glucose) are still often depicted in different ways. This post demonstrates how to convert an openly available dataset called MIMIC-III, which consists of de-identified medical data for about 40,000 patients, into an open source data […]

Visualize Big Data with HAQM QuickSight, Presto, and Apache Spark on HAQM EMR

February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. Last December, we introduced the HAQM Athena connector in HAQM QuickSight, in the Derive Insights from IoT in Minutes using AWS IoT, HAQM Kinesis Firehose, HAQM Athena, and HAQM QuickSight post. The […]