Posted On: Aug 27, 2019
HAQM SageMaker now supports HAQM Elastic File System (HAQM EFS) and HAQM FSx for Lustre file systems as data sources for training machine learning models on SageMaker. HAQM FSx for Lustre is a high performance file system optimized for workloads, such as machine learning, analytics and high performance computing. HAQM EFS provides a simple, scalable, elastic file system for Linux-based workloads for use with AWS Cloud services and on-premises resources. Support for these file systems accelerates and simplifies using HAQM SageMaker to train models with data sets. The file system data source reduces the start-up time by eliminating the data download step of the training process and leveraging the various performance and throughput benefits of the file system to execute the training job faster.
Until today, HAQM SageMaker transparently downloaded a full training set from HAQM S3 to local file storage at the start of a training job, when using the File input mode. Now with HAQM FSx for Lustre, customers can accelerate their File mode training jobs by avoiding the initial HAQM S3 download time. When HAQM FSx for Lustre file system is linked to HAQM S3 buckets, it automatically copies objects from HAQM S3 to the file system when objects are accessed for the first time. The same FSx file system can also be used across multiple SageMaker jobs, preventing repeated downloading of common objects.
Also until today, customers could only use HAQM SageMaker with training sets stored on HAQM S3. Now, customers can also use training sets that are stored on HAQM EFS. HAQM SageMaker interacts directly with HAQM EFS, eliminating the need to copy data sets from HAQM EFS to HAQM S3 for use with HAQM SageMaker.
Most HAQM SageMaker built-in machine learning algorithms support EFS and FSx for Lustre as input data source. This feature is available in all regions where the respective file systems are available. For details on region availability please check the AWS region table.
Visit the documentation for more information and read the blog post for how to use the feature.