AWS Machine Learning Blog

Translate documents in real time with HAQM Translate

A critical component of business success is the ability to connect with customers. Businesses today want to connect with their customers by offering their content across multiple languages in real time. For most customers, the content creation process is disconnected from the localization effort of translating content into multiple target languages. These disconnected processes delay the business ability to simultaneously publish content in multiple languages, inhibiting their outreach efforts which negatively impacts time to market and revenue.

HAQM Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Now, HAQM Translate offers real-time document translation to seamlessly integrate and accelerate content creation and localization. You can submit a document from the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDK and receive the translated document in real time while maintaining the format of the original document. This feature eliminates the wait for documents to be translated in asynchronous batch mode.

Real-time document translation currently supports plain text and HTML documents. You can use other HAQM Translate features such as custom terminology, profanity masking, and formality as part of the real-time document translation.

In this post, we will show you how to use this new feature.

Solution overview

This post walks you through the steps required to use real-time document translation with the console, AWS CLI, and HAQM Translate SDK. As an example, we will translate this sample text file from English to French.

Use HAQM Translate via the console

Follow these steps to try out real-time document translation on the console:

  1. On the HAQM Translate console, choose Real-time translation in the navigation pane.
  2. Choose the Document tab.
  3. Specify the language of the source file as English.
  4. Specify the language of the target file as French.

Note: Source or Target language should be English for real-time document translation.

  1. Select Choose file and upload the file you want to translate.
  2. Specify the document type.

Text and HTML formats are supported at the time of this writing.

  1. Under Additional settings, you can use other HAQM Translate features in conjunction with real-time document translation.

For more information about HAQM Translate features, refer to the following resources:

  1. Choose Translate and Download.

The translated file is automatically saved to your browser’s downloaded folder, usually to Downloads. The target language code will be prefixed to the translated file’s name. For example, if your source file name is lang.txt and your target language is French (fr), then the translated file will be named fr.lang.txt.

Use HAQM Translate with the AWS CLI

You can translate the contents of a file using the following AWS CLI command. In this example, the contents of source-lang.txt will be translated into target-lang.txt.

aws translate translate-document --source-language-code en --target-language es 
--document-content fileb://source-lang.txt 
--document ContentType=text/plain 
--query "TranslatedDocument.Content" 
--output text | base64 
--decode > target-lang.txt

Use the HAQM Translate SDK (Python Boto3)

You can use the following Python code to invoke HAQM Translate SDK API to translate text or HTML documents synchronously:

import boto3
import argparse

# Initialize parser
parser = argparse.ArgumentParser()
parser.add_argument("SourceLanguageCode")
parser.add_argument("TargetLanguageCode")
parser.add_argument("SourceFile")
args = parser.parse_args()


translate = boto3.client('translate’)

localFile = args.SourceFile
file = open(localFile, "rb")
data = file.read()
file.close()


result = translate.translate_document(
    Document={
            "Content": data,
            "ContentType": "text/html"
        },
    SourceLanguageCode=args.SourceLanguageCode,
    TargetLanguageCode=args.TargetLanguageCode
)
if "TranslatedDocument" in result:
    fileName = localFile.split("/")[-1]
    tmpfile = f"{args.TargetLanguageCode}-{fileName}"
    with open(tmpfile,  'w', encoding='utf-8') as f:
     
    f.write(str(result["TranslatedDocument"]["Content"]))

    print("Translated document ", tmpfile)

This program accepts three arguments: source language, target language, and file path. Use the following command to invoke this program:

python syncDocumentTranslation.py en es source-lang.txt

Conclusion

The real-time document translation feature in HAQM Translate can expedite time to market by enabling easy integration with content creation and localization. Real-time document translation improves content creation and the localization process.

For more information about HAQM Translate, visit HAQM Translate resources to find video resources and blog posts, and refer to AWS Translate FAQs.


About the Authors

Sathya Balakrishnan is a Senior Consultant in the Professional Services team at AWS, specializing in data and ML solutions. He works with US federal financial clients. He is passionate about building pragmatic solutions to solve customers’ business problems. In his spare time, he enjoys watching movies and hiking with his family.

RG Thiyagarajan is a Senior Consultant in Professional Services at AWS, specializing in application migration, security, and resiliency with US federal financial clients.

Sid Padgaonkar is the Senior Product Manager for HAQM Translate, AWS’s natural language processing service. On weekends, you will find him playing squash and exploring the food scene in the Pacific Northwest.