AWS Machine Learning Blog

Enhancing enterprise search with HAQM Kendra

HAQM Kendra is an easy-to-use enterprise search service that allows you to add search capabilities to your applications so end-users can easily find information stored in different data sources within your company. This could include invoices, business documents, technical manuals, sales reports, corporate glossaries, internal websites, and more. You can harvest this information from storage solutions like HAQM Simple Storage Service (HAQM S3) and OneDrive; applications such as SalesForce, SharePoint and Service Now; or relational databases like HAQM Relational Database Service (HAQM RDS)

When you type a question, the service uses machine learning (ML) algorithms to understand the context and return the most relevant results, whether that’s a precise answer or an entire document. Most importantly, you don’t need to have any ML experience to do this—HAQM Kendra also provides you with the code that you need to easily integrate with your new or existing applications.

This post shows you how to create your internal enterprise search by using the capabilities of HAQM Kendra. This enables you to build a solution to create and query your own search index. For this post, you use HAQM.com help documents in HTML format as the data source, but HAQM Kendra also supports MS Office (.doc, .ppt), PDF, and text formats.

Overview of solution

This post provides the steps to help you create an enterprise search engine on AWS using HAQM Kendra. You can provision a new HAQM Kendra index in under an hour without much technical depth or ML experience.

The post also demonstrates how to configure HAQM Kendra for a customized experience by adding FAQs, deploying HAQM Kendra in custom applications, and synchronizing data sources. This post addresses and answers these questions in the subsequent sections.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Creating and configuring your document repository

Before you can create an index in HAQM Kendra, you need to load documents into an S3 bucket. This section contains instructions to create an S3 bucket, get the files, and load them into the bucket. After completing all the steps in this section, you have a data source that HAQM Kendra can use.

  1. On the AWS Management Console, in the Region list, choose US East (N. Virginia) or any Region of your choice that HAQM Kendra is available in.
  2. Choose Services.
  3. Under Storage, choose S3.
  4. On the HAQM S3 console, choose Create bucket.
  5. Under General configuration, provide the following information:
    • Bucket name: kendrapost-{your account id}.
    • Region: Choose the same Region that you use to deploy your HAQM Kendra index (this post uses US East (N. Virginia) us-east-1).
  6. Under Bucket settings for Block Public Access, leave everything with the default values.
  7. Under Advanced settings, leave everything with the default values.
  8. Choose Create bucket.
  9. Download amazon_help_docs.zip and unzip the files.
  10. On the HAQM S3 console, select the bucket that you just created and choose Upload.
  11. Upload the unzipped files.

Inside your bucket, you should now see two folders: amazon_help_docs (with 3,100 objects) and faqs (with one object).

The following screenshot shows the contents of amazon_help_docs.

The following screenshot shows the contents of faqs.

Creating an index

An index is the HAQM Kendra component that provides search results for documents and frequently asked questions. After completing all the steps in this section, you have an index ready to consume documents from different data sources. For more information about indexes, see Index.

To create your first HAQM Kendra index, complete the following steps:

  1. On the console, choose Services.
  2. Under Machine Learning, choose HAQM Kendra.
  3. On the HAQM Kendra main page, choose Create an Index.
  4. In the Index details section, for Index name, enter kendra-blog-index.
  5. For Description, enter My first Kendra index.
  6. For IAM role, choose Create a new role.
  7. For Role name, enter -index-role (your role name has the prefix HAQMKendra-YourRegion-).
  8. For Encryption, don’t select Use an AWS KMW managed encryption key.

(Your data is encrypted with an HAQM Kendra-owned key by default.)

  1. Choose Next.

For more information about the IAM roles HAQM Kendra creates, see Prerequisites.

HAQM Kendra offers two editions. Kendra Enterprise Edition provides a high-availability service for production workloads. Kendra Developer Edition is suited for building a proof-of-concept and experimentation. For this post, you use the Developer edition.

  1. In the Provisioning editions section, select Developer edition.
  2. Choose Create.

For more information on the free tier, document size limits, and total storage for each HAQM Kendra edition, see HAQM Kendra pricing.

The index creation process can take up to 30 minutes. When the creation process is complete, you see a message at the top of the page that you successfully created your index.

Adding a data source

A data source is a location that stores the documents for indexing. You can synchronize data sources automatically with an HAQM Kendra index to make sure that searches correctly reflect new, updated, or deleted documents in the source repositories.

After completing all the steps in this section, you have a data source linked to HAQM Kendra. For more information, see Adding documents from a data source.

Before continuing, make sure that the index creation is complete and the index shows as Active.

  1. On the kendra-blog-index page, choose Add data sources.

HAQM Kendra supports six types of data sources: HAQM S3, SharePoint Online, ServiceNow, OneDrive, Salesforce online, and HAQM RDS. For this post, you use HAQM S3.

  1. Under HAQM S3, choose Add connector.

For more information about the different data sources that HAQM Kendra supports, see Adding documents from a data source.

  1. In the Define attributes section, for Data source name, enter amazon_help_docs.
  2. For Description, enter AWS services documentation.
  3. Choose Next.
  4. In the Configure settings section, for Enter the data source location, enter the S3 bucket you created: kendrapost-{your account id}.
  5. Leave Metadata files prefix folder location

By default, metadata files are stored in the same directory as the documents. If you want to place these files in a different folder, you can add a prefix. For more information, see S3 document metadata.

  1. For Select decryption key, leave it deselected.
  2. For Role name, enter source-role (your role name is prefixed with HAQMKendra-).
  3. For Additional configuration, you can add a pattern to include or exclude certain folders or files. For this post, keep the default values.
  4. For Frequency, choose Run on demand.

This step defines the frequency with which the data source is synchronized with the HAQM Kendra index. For this walkthrough, you do this manually (one time only).

  1. Choose Next.
  2. On the Review and create page, choose Create.
  3. After you create the data source, choose Sync now to synchronize the documents with the HAQM Kendra index.

The duration of this process depends on the number of documents that you index. For this use case, it may take 15 minutes, after which you should see a message that the sync was successful.

In the Sync run history section, you can see that 3,099 documents were synchronized.

Exploring the search index using the search console

The goal of this section is to let you explore possible search queries via the built-in HAQM Kendra console.

To search the index you created above, complete the following steps:

  1. Under Indexes, choose kendra-blog-index.
  2. Choose Search console.

Kendra can answer three types of questions: factoid, descriptive, and keyword. For more information, see HAQM Kendra FAQs. You can ask some questions using the HAQM.com help documents that you uploaded earlier.

In the search field, enter What is HAQM music unlimited?

With a factoid question (who, what, when, where), HAQM Kendra can answer and also offer a link to the source document.

As a keyword search, enter shipping rates to Canada. The following screenshot shows the answer HAQM Kendra gives.

Adding FAQs

You can also upload a list of FAQs to provide direct answers to common questions your end-users ask. To do this, you need to load a .csv file with the information related to the questions. This section contains instructions to create and configure that file and load it into HAQM Kendra.

  1. On the HAQM Kendra console, navigate to your index.
  2. Under Data management, choose FAQs.
  3. Choose Add FAQ.
  4. In the Define FAQ project section, for FAQ name, enter kendra-post-faq.
  5. For Description, enter My first FAQ list.

HAQM Kendra accepts .csv files formatted with each row beginning with a question followed by its answer. For example, see the following table.

Question Answer URL (optional)
What is the height of the Space Needle?  605 feet  http://www.spaceneedle.com/
How tall is the Space Needle?  605 feet  http://www.spaceneedle.com/
What is the height of the CN Tower? 1815 feet http://www.cntower.ca/
How tall is the CN Tower? 1815 feet http://www.cntower.ca/

This is how the .CSV file included for this use case looks like:

"How do I sign up for the HAQM Prime free Trial?"," To sign up for the HAQM Prime free trial, your account must have a current, valid credit card. Payment options such as an HAQM.com Corporate Line of Credit, checking accounts, pre-paid credit cards, or gift cards cannot be used. "," http://www.haqm.com/gp/help/customer/display.html/ref=hp_left_v4_sib?ie=UTF8&nodeId=201910190”
  1. Under FAQ settings, for S3, enter s3://kendrapost-{your account id}/faqs/kendrapost.csv.
  2. For IAM role, choose Create a new role.
  3. For Role name, enter faqs-role (your role name is prefixed with HAQMKendra-).
  4. Choose Add.
  5. Wait until you see the status show as Active.

You can now see how the FAQ works on the search console.

  1. Under Indexes, choose your index.
  2. Under Data management, choose Search console.
  3. In the search field, enter How do I sign up for the HAQM Prime free Trial?
  4. The following screenshot shows that HAQM Kendra added the FAQ that you uploaded previously to the results list, and provides an answer and a link to the related documentation.

Using HAQM Kendra in your own applications

You can add the following components from the search console in your application:

  • Main search page The main page that contains all the components. This is where you integrate your application with the HAQM Kendra API.
  • Search bar The component where you enter a search term and that calls the search function.
  • Results The component that displays the results from HAQM Kendra. It has three components: suggested answers, FAQ results, and recommended documents.
  • Pagination The component that paginates the response from HAQM Kendra.

HAQM Kendra provides source code that you can deploy in your website. This is offered free of charge under a modified MIT license so you can use it as is or change it for your own needs.

This section contains instructions to deploy HAQM Kendra search to your website. You use a Node.js demo application that runs locally in your machine. This use case is based on a MacOS environment.

To run this demo, you need the following components:

  1. Download amazon_aws-kendra-sample-app-master.zip and unzip the file.
  2. Open a terminal window and go to the aws-kendra-sample-app-master folder:
    cd /{folder path}/aws-kendra-sample-app-master
  3. Create a copy of the .env.development.local.example file as .env.development.local:
    cp .env.development.local.example .env.development.local
  4. Edit the .env.development.local file and add the following connection parameters:
    • REACT_APP_INDEX – Your HAQM Kendra index ID (you can find this number on the Index home page)
    • REACT_APP_AWS_ACCESS_KEY_ID – Your account access key
    • REACT_APP_AWS_SECRET_ACCESS_KEY – Your account secret access key
    • REACT_APP_AWS_SESSION_TOKEN – Leave it blank for this use case
    • REACT_APP_AWS_DEFAULT_REGION – The Region that you used to deploy the Kendra index (for example, us-east-1)
  5. Save the changes.
  6. Install the Node.js dependencies:
    npm install
  7. Launch the local development server:
    npm start
  8. View the demo app at http://localhost:3000/. You should see the following screenshot.
  9. Enter the same question you used to test the FAQs: How do I sign up for the HAQM Prime free Trial?

The following screenshot shows that the result is the same as the one you got from the HAQM Kendra console, even though the demo webpage is running locally in your machine.

Cleaning up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created: the HAQM Kendra index, S3 bucket, and corresponding IAM roles.

 

  1. To delete the HAQM Kendra index, under Indexes, choose kendra-blog-index.
  2. In the index settings section, from the Actions drop-down menu, choose Delete.
  3. To confirm deletion, enter Delete in the field and choose Delete.

Wait until you get the confirmation message; the process can take up to 15 minutes.

For instructions on deleting your S3 bucket, see How do I delete an S3 Bucket?

Conclusion

In this post, you learned how to use HAQM Kendra to deploy an enterprise search service. You can use HAQM Kendra to improve the search experience in your company, powered by ML. You can enable rapid look for your documents using natural language, without any previous ML/AI experience. For more information about HAQM Kendra, see AWS re:Invent 2019 – Keynote with Andy Jassy on YouTube, HAQM Kendra FAQs, and What is HAQM Kendra?


About the Author

Leonardo Gómez is a Big Data Specialist Solutions Architect at AWS. Based in Toronto, Canada, He works with customers across Canada to design and build big data architectures.