AWS Big Data Blog
Encrypt Data At-Rest and In-Flight on HAQM EMR with Security Configurations
Customers running analytics, stream processing, machine learning, and ETL workloads on personally identifiable information, health information, and financial data have strict requirements for encryption of data at-rest and in-transit. The Apache Spark and Hadoop ecosystems lend themselves to these big data use cases, and customers have asked us to provide a quick and easy way to encrypt data at-rest and data in-transit between nodes in each execution framework.
With the release of security configurations for HAQM EMR release 5.0.0 and 4.8.0, customers can now easily enable encryption for data at-rest in HAQM S3, HDFS, and local disk, and enable encryption for data in-flight in the Apache Spark, Apache Tez, and Apache Hadoop MapReduce frameworks.
Security configurations make it easy to specify the encryption keys and certificates to use, ranging from AWS Key Management Service to supplying your own custom encryption materials provider (for an example of custom providers, see the Nasdaq about EMRFS and HAQM S3 client-side encryption post). Additionally, you can apply a security configuration to multiple clusters, making it easy to standardize your security settings. For instance, this makes it easy for customers to encrypt data across their HIPAA-compliant HAQM EMR workloads.
The following is an example security configuration specifying SSE-KMS for HAQM S3 encryption (using EMRFS), AWS KMS key for local disk encryption (which will also encrypt HDFS blocks), and a set of TLS certificates in HAQM S3 for applications that require them for encryption in-transit:
After you create a security configuration, you can specify it when creating a cluster and apply the settings. Security configurations can also be created using the AWS CLI or SDK. For more information, see Encrypting Data with HAQM EMR. If you have any questions or would like to share an interesting use case about encryption on HAQM EMR, please leave a comment below.