Posted On: Oct 7, 2021

HAQM SageMaker now supports Fast File Mode for accessing data in training jobs. This enables high performance data access by streaming directly from HAQM S3 with no code changes from the existing File Mode. For example, training a K-Means clustering model on a 100GB dataset took 28 minutes with File Mode but only 5 minutes with Fast File Mode (82% decrease).

Training machine learning models often requires large amounts of data. Efficiently accessing that data helps improve model training performance. Until now, SageMaker offered two modes for reading data directly from HAQM S3: File Mode and Pipe Mode. File Mode downloads training data to an encrypted HAQM EBS volume attached to the training instance. This download needs to finish before model training starts. Pipe Mode streams the data directly to the training algorithm, which can lead to better performance, but requires code changes.

Fast File Mode combines the ease of use of the existing File Mode with the performance of Pipe Mode. This provides convenient access to data as if it was downloaded locally, while offering the performance benefit of streaming the data directly from HAQM S3. As a result, training can start without waiting for the entire dataset to be downloaded to the training instances. Fast File Mode is available to use without additional charges.

To learn more, please view the documentation for accessing training data in SageMaker. To get started, log into the HAQM SageMaker console.