AWS Big Data Blog

Migrate to Apache HBase on HAQM S3 on HAQM EMR: Guidelines and Best Practices

This blog post provides guidance and best practices about how to migrate from Apache HBase on HDFS to Apache HBase on HAQM S3 on HAQM EMR.

Apache HBase on HAQM S3 on HAQM EMR

HAQM EMR version 5.2.0 or later, lets you run Apache HBase on HAQM S3. By using HAQM S3 as a data store for Apache HBase, you can separate your cluster’s storage and compute nodes. This saves costs because you’re sizing your cluster for your compute requirements. You’re not paying to store your entire dataset with 3x replication in the on-cluster HDFS.

Many customers have taken advantage of the benefits of running Apache HBase on HAQM S3 for data storage. These benefits include lower costs, data durability, and more efficient scalability. Customers, such as the Financial Industry Regulatory Agency (FINRA), have lowered their costs by 60% by moving to an Apache HBase on HAQM S3 architecture. They have also experienced operational benefits that come with decoupling storage from compute and using HAQM S3 as the storage layer.

Whitepaper on Migrating to Apache HBase on HAQM S3 on HAQM EMR

This whitepaper walks you through the stages of a migration. It also helps you determine when to choose Apache HBase on HAQM S3 on HAQM EMR, plan for platform security, tune Apache HBase and EMRFS to support your application SLA, identify options to migrate and restore your data, and manage your cluster in production.

For more information, see Migrating to Apache HBase on HAQM S3 on HAQM EMR


Additional Reading

If you found this post useful, be sure to check out Setting up Read Replica Clusters with HBase on HAQM S3, and Tips for Migrating to Apache HBase on HAQM S3 from HDFS.

 


About the Author

Francisco Oliveira is a Senior Big Data Engineer with AWS Professional Services. He focuses on building big data solutions with open source technology and AWS. In his free time, he likes to try new sports, travel and explore national parks.