Posted On: Apr 21, 2016
You can now use Apache HBase 1.2 on HAQM EMR release 4.6.0. Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. It is an open-source, non-relational, versioned database which runs on top of the Hadoop Distributed Filesystem (HDFS), and it is built for random, strictly consistent realtime access for tables with billions of rows and millions of columns. It has tight integration with Apache Hadoop, Apache Hive, and Apache Pig, so you can easily combine massively parallel analytics with fast data access. Apache HBase's data model, throughput, and fault tolerance are a good match for workloads in ad tech, web analytics, financial services, applications using time-series data, and many more.
You can create an HAQM EMR cluster with HBase 1.2 by choosing release label “emr-4.6.0” from the AWS Management Console, AWS CLI, or SDK and specifying HBase as an application. Also, HBase RegionServers, which manage and serve data in HBase, are only installed on HAQM EMR Core Nodes (and not Task Nodes) because they must be collocated with HDFS DataNodes. Please visit the HAQM EMR documentation for more information about HBase on HAQM EMR.