Posted On: Dec 8, 2020

HAQM SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With HAQM SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. 

For most ML models, you can spend weeks or months aggregating and preparing data from different sources: converting, transforming, and validating raw data into features that can be used to train models and make predictions. You need to write code to author data transformations so you can transform data into formats that can be efficiently used for a model, and write additional code that can run at scale across a wide number of data sources–time far better spent on higher-value tasks.

Using HAQM SageMaker Data Wrangler’s data selection tool, you can choose the data you want from various data sources, including HAQM S3, HAQM Athena, HAQM Redshift, AWS Lake Formation, and HAQM SageMaker Feature Store, and import it with a single click. HAQM SageMaker Data Wrangler contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. With HAQM SageMaker Data Wrangler’s visualization templates, you can quickly preview and inspect that these transformations are completed as you intended by viewing them in HAQM SageMaker Studio, the first fully integrated development environment (IDE) for ML. Once your data is prepared, you can build fully automated ML workflows with HAQM SageMaker Pipelines and save them for reuse in the HAQM SageMaker Feature Store.

HAQM SageMaker Data Wrangler is generally available in all regions where HAQM SageMaker Studio is available. To get started with HAQM SageMaker Data Wrangler, visit our documentation.