AWS Big Data Blog
Tag: S3DistCp
Migrate data from an on-premises Hadoop environment to HAQM S3 using S3DistCp with AWS Direct Connect
This post demonstrates how to migrate nearly any amount of data from an on-premises Apache Hadoop environment to HAQM Simple Storage Service (HAQM S3) by using S3DistCp on HAQM EMR with AWS Direct Connect. To transfer resources from a target EMR cluster, the traditional Hadoop DistCp must be run on the source cluster to move […]
Seven Tips for Using S3DistCp on HAQM EMR to Move Data Efficiently Between HDFS and HAQM S3
Although it’s common for HAQM EMR customers to process data directly in HAQM S3, there are occasions where you might want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your HAQM EMR cluster. Additionally, you might have a use case that requires moving large amounts of data between buckets or regions. In these use cases, large datasets are too big for a simple copy operation.