Posted On: May 24, 2018
You can now run your training jobs with the built-in HAQM SageMaker algorithms up to 35% faster with Pipe input mode. Using Pipe input mode, your training job streams data directly from HAQM Simple Storage Service (HAQM S3) to the algorithm container on the training instances, to provide faster start times for training jobs and better throughput. For example, benchmarks indicated start times improved by up to 10 minutes on an 78GB file, with throughput twice as fast in some benchmarks.
Most HAQM SageMaker algorithms work best when you use the optimized protobuf recordIO format for training data for speed optimization. Using this format allows you to take advantage of Pipe input mode when training the algorithms that support it. Prior to Pipe input mode, all of your data was loaded from HAQM S3 to the HAQM Elastic Block Store (HAQM EBS) volumes attached to your training instances using File input mode, which required disk space to store both your final model artifacts and your full training dataset. File input mode is still preferred when the algorithm requires multiple epochs and the training dataset is small enough to fit in memory, but Pipe input mode works better with large datasets.
Pipe input mode is available in HAQM SageMaker today in the US East (N. Virginia), U.S. East (Ohio), EU (Ireland) and U.S West (Oregon) AWS regions. Visit the documentation for more information on Pipe Input Mode with select HAQM SageMaker algorithms, and read the blog post for how to use the Pipe Input Mode feature and review benchmarks against File Input Mode.