Posted On: Dec 8, 2020

We’re excited to announce HAQM SageMaker Pipelines, a new capability of HAQM SageMaker to build, manage, automate, and scale end to end machine learning workflows. SageMaker Pipelines brings automation and orchestration to ML workflows, enabling you to accelerate machine learning projects and scale up to thousands of models in production.

Machine Learning is an iterative process and requires collaboration across different stakeholders such as data engineers, data scientists, ML engineers, and DevOps engineers. It is challenging to build a scalable process for building models as the number of steps across data preparation, feature engineering, training, and model evaluation can become large, increasing the complexity in managing data dependencies. As the number of models rise, managing model versions and deploying them in production requires automation in an easy and scalable manner. Finally, tracking lineage across the end to end pipeline requires custom tooling for tracking of data and model artifacts and actions.

HAQM SageMaker Pipelines enables data science and engineering teams to collaborate seamlessly on ML projects and streamline building, automating, and scaling of end to end ML workflows. HAQM SageMaker SDK makes it easy to construct model building pipelines by defining the parameters and steps which can include HAQM SageMaker Data Wrangler, Processing, Training, Batch Transform, conditional evaluation, and registering models to the central model registry. Once the pipelines are built, HAQM SageMaker takes care of the execution of the pipelines and you can view the pipeline executions and the real-time metrics and logs for each step in HAQM SageMaker Studio. Models are registered to the new HAQM SageMaker model registry which automatically versions new models generated from pipelines and offers built-in approval workflows to select which models are deployed to production.

HAQM SageMaker Pipelines offers DevOps best practices of Continuous Integration and Continuous Delivery (CI/CD) applied to machine learning (known as MLOps) to automate and scale ML model building and deployment pipelines. HAQM SageMaker Pipelines provides built in MLOps templates so you can get started with CI/CD for ML Projects and also provides the ability to use custom MLOps templates. As a result, you can quickly and easily scale your ML Pipelines without relying on manual processes and better ensure code consistency, integration and unit testing, and reliable model updates in production. Finally, HAQM SageMaker Pipelines automatically tracks lineage for each step of your ML pipeline, which may help with any governance and audit requirements, without the need for building any custom tooling.

HAQM SageMaker Pipelines is now generally available in all commercial AWS Regions where HAQM SageMaker is available and the MLOps capabilities of HAQM SageMaker Pipelines are only available in the AWS Regions where AWS CodePipeline is also available. Read the documentation for more information and for sample notebooks. To learn how to use the feature visit the blog post.