Posted On: May 29, 2019

We are excited to announce the general availability of HAQM Textract, which has been in preview since re:invent 2018. HAQM Textract is a managed machine learning service that automatically extracts text and structured data from virtually any document. Customers use HAQM Textract to quickly automate document workflows, processing millions of document pages in a few hours.

HAQM Textract goes beyond simple optical character recognition (OCR) to identify the contents of fields in forms, information stored in tables, and the context in which the information is presented. HAQM Textract’s API supports multiple image formats like scans, PDFs, and photos, and customers can use it with other AWS machine learning services like HAQM Comprehend, HAQM Comprehend Medical, and HAQM Translate to derive deeper meaning from the extracted text and data. The extracted text and data can also be used to build smart searches on large archives of documents, or it can be loaded into a database for use by applications, such as accounting, auditing, and compliance software. To learn more about HAQM Textract, please visit the HAQM Textract website.

HAQM Textract is now available in the following AWS regions: Northern Virginia, Ohio, Oregon, and Ireland. To get started with HAQM Textract, read the Getting Started guide.