AWS Machine Learning Blog
Use HAQM Q to find answers on Google Drive in an enterprise
HAQM Q Business is a generative AI-powered assistant designed to enhance enterprise operations. It’s a fully managed service that helps provide accurate answers to users’ questions while adhering to the security and access restrictions of the content. You can tailor HAQM Q Business to your specific business needs by connecting to your company’s information and enterprise systems using built-in connectors to a variety of enterprise data sources. It enables users in various roles, such as marketing managers, project managers, and sales representatives, to have tailored conversations, solve business problems, generate content, take action, and more, through a web interface. This service aims to help make employees work smarter, move faster, and drive significant impact by providing immediate and relevant information to help them with their tasks.
One such enterprise data repository you can use to store and manage content is Google Drive. Google Drive is a cloud-based storage service that provides a centralized location for storing digital assets, including documents, knowledge articles, and spreadsheets. This service helps your teams collaborate effectively by enabling the sharing and organization of important files across the enterprise. To use Google Drive within HAQM Q Business, you can configure the HAQM Q Business Google Drive connector. This connector allows HAQM Q Business to securely index files stored in Google Drive using access control lists (ACLs). These ACLs make sure that users only access the documents they’re permitted to view, allowing them to ask questions and retrieve information relevant to their work directly through HAQM Q Business.
This post covers the steps to configure the HAQM Q Business Google Drive connector, including authentication setup and verifying the secure indexing of your Google Drive content.
Index Google Drive documents using the HAQM Q Google Drive connector
The HAQM Q Google Drive connector can index Google Drive documents hosted in a Google Workspace account. The connector can’t index documents stored on Google Drive in a personal Google Gmail account. HAQM Q Business can authenticate with your Google Workspace using a service account or OAuth 2.0 authentication. A service account enables indexing files for user accounts across an enterprise in a Google Workspace. Using OAuth 2.0 authentication allows for crawling and indexing files in a single Google Workspace account. This post shows you how to configure HAQM Q Business to authenticate using a Google service account.
Google prescribes that in order to index multiple users’ documents, the crawler must support the capability to authenticate with a service account with domain-wide delegation. This allows the connector to index the documents of all users in your drive and shared drives. HAQM Q Business connectors only crawl the documents that the HAQM Q Business application administrator specifies need to be crawled. Administrators can specify the paths to crawl, specific file name patterns, or types. HAQM Q Business doesn’t use customer data to train any models. All customer data is indexed only in the customer account. Also, HAQM Q Business Connectors will only index content specified by the administrator. It won’t index any content on its own without explicitly being configured to do so by the administrator of HAQM Q Business.
You can configure the HAQM Q Google Drive connector to crawl and index file types supported by HAQM Q Business. Google Write documents are exported as Microsoft Word and Google Sheet documents are exported as Microsoft Excel during the crawling phase.
Metadata
Every document has structural attributes—or metadata—attached to it. Document attributes can include information such as document title, document author, time created, time updated, and document type.
When you connect HAQM Q Business to a data source, it automatically maps specific data source document attributes to fields within an HAQM Q Business index. If a document attribute in your data source doesn’t have an attribute mapping already available, or if you want to map additional document attributes to index fields, you can use the custom field mappings to specify how a data source attribute maps to an HAQM Q Business index field. You can create field mappings by editing your data source after your application and retriever are created.
There are four default metadata attributes indexed for each Google Drive document: authors, source URL, creation date, and last update date. You can also select additional reserved data field mappings.
HAQM Q Business crawls Google Drive ACLs defined in a Google Workspace for document security. Google Workspace users and groups are mapped to the _user_id
and _group_ids
fields associated with the HAQM Q Business application in AWS IAM Identity Center. These user and group associations are persisted in the user store associated with the HAQM Q Business index created for crawled Google Drive documents.
Overview of ACLs in HAQM Q Business
In the context of knowledge management and generative AI chatbot applications, an ACL plays a crucial role in managing who can access information and what actions they can perform within the system. They also facilitate knowledge sharing within specific groups or teams while restricting access to others.
In this solution, we deploy an HAQM Q web experience to demonstrate that two business users can only ask questions about documents they have access to according to the ACL. With the HAQM Q Business Google Drive connector, the Google Workspace ACL will be ingested with documents. This enables HAQM Q Business to control the scope of documents that each user can access in the HAQM Q web experience.
Authentication types
An HAQM Q Business application requires you to use IAM Identity Center to manage user access. Although it’s recommended to have an IAM Identity Center instance configured (with users federated and groups added) before you start, you can also choose to create and configure an IAM Identity Center instance for your HAQM Q Business application using the HAQM Q console.
You can also add users to your IAM Identity Center instance from the HAQM Q Business console, if you aren’t federating identity. When you add a new user, make sure that the user is enabled in your IAM Identity Center instance and that they have verified their email ID. They need to complete these steps before they can log in to your HAQM Q Business web experience.
Your identity source in IAM Identity Center defines where your users and groups are managed. After you configure your identity source, you can look up users or groups to grant them single sign-on access to AWS accounts, applications, or both.
You can have only one identity source per organization in AWS Organizations. You can choose one of the following as your identity source:
- IAM Identity Center directory – When you enable IAM Identity Center for the first time, it’s automatically configured with an IAM Identity Center directory as your default identity source. This is where you create your users and groups, and assign their level of access to your AWS accounts and applications. For more details, see Manage identities in IAM Identity Center.
- Active Directory – Choose this option if you want to continue managing users in either your AWS Managed Microsoft AD directory using AWS Directory Service or your self- managed directory in Active Directory (AD).
- External identity provider – Choose this option if you want to manage users in other external identity providers (IdPs) through the SAML 2.0 standard, such as Okta.
- IAM identity provider – HAQM Q Business applications can now federate with an enterprise’s IAM IdP. For more information, refer to Build private and secure enterprise generative AI applications with HAQM Q Business using IAM Federation.
Overview of solution
With HAQM Q Business, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index Google Drive data using the HAQM Q Business Google Drive connector. We complete the following steps:
- Configure Google Workspace prerequisites.
- Configure an HAQM Q Business application.
- Connect Google Drive to HAQM Q Business.
- Create users and index the data in the Google Drive.
- Run a sample query to test the solution.
Configure Google Workspace prerequisites
For this solution, HAQM Q will connect to a Google Workspace and crawl Google Drive documents owned by business users in different groups using a service account. Complete the following steps to configure your Google Workspace:
- Log in to the Google API console as an admin user.
- Choose the dropdown menu next to the search box, then choose New Project.
- Enter the project name, choose the Google organization, and choose Create.
The Google Drive and Admin SDK APIs need to be enabled for HAQM Q to crawl Google Drive files.
- Search for each API on the Google Cloud console and choose Enable.
- Search for Service Accounts to access the IAM & Admin navigation pane and choose Create Service Account.
- Enter the service account name, service account ID, and description, and choose Done.
- Choose the email of the service account created in the previous step.
- On the Keys tab, choose Add Key, then choose Create New Key.
- For Key type, select JSON, and choose Create to download and locally save a new private key.
Now we enable domain-wide delegation for the five required API scopes on the Domain-wide Delegation page.
- Choose Add new.
- Add the following comma delimited API scopes for client ID generated for the private key created in the previous step:
http://www.googleapis.com/auth/drive.readonly,
http://www.googleapis.com/auth/drive.metadata.readonly,
http://www.googleapis.com/auth/admin.directory.group.readonly,
http://www.googleapis.com/auth/admin.directory.user.readonly,
http://www.googleapis.com/auth/cloud-platform
- Choose Authorize.
Now we create users and add them to groups.
- Navigate to the Google Workspace Admin console and choose Users in the navigation pane.
- Choose Add new user to create two new business users.
- Choose Groups in the navigation pane.
- Choose Create group to create two Google groups and add one business user to each group.
- Upload files that HAQM Q supports into each business user’s Google Drive.
In this solution, we upload the HAQM 2020 annual report to the first business user’s Google Drive and upload the HAQM 2021 annual report and HAQM 2022 annual report to the second business user’s Google Drive.
The business user that uploaded the HAQM 2021 annual report can also share it with the other business user’s Google group.
- Choose the options menu (three vertical dots) for the Google Drive file and choose Share.
- Enter the name of the other Google group and choose Send.
Create an HAQM Q Business application with a Google Drive connector
An HAQM Q Business application needs to be created with a Google Drive connector to crawl and index Google Drive files. To create an HAQM Q application, complete the following steps:
- On the HAQM Q console, choose Applications in the navigation pane.
- Choose Create application.
- For Application name, enter a name.
- Leave application configuration settings as defaults.
- Choose Create.
- After the application is created, choose Data Sources.
- Then choose Select retriever and Confirm to use a Native retriever and Enterprise provisioning.
- After confirming retriever settings, choose Add data source, and then choose the plus sign next to Google Drive.
- Under Name and description, enter a data source name and optional description.
- Under Authentication, select Google service account and choose Create a new secret from the AWS Secrets Manager secret drop down to create an AWS Secrets Manager secret.
- Enter a secret name, admin account email, client email, and the JSON key you downloaded earlier, then choose Save.
- Under IAM role, choose Create a new service role.
- Under Additional Configuration, choose User email, and add the two recently created Google Workspace business user email addresses.
- Under Sync run schedule, for Frequency, choose Run on demand.
- Choose Add data source.
Create and manage users
To create an HAQM Q web experience accessible by Google Workspace users, you need to create corresponding users in IAM Identity Center. HAQM Q applications are only accessible by IAM Identity Center users with user identities that own indexed documents. To create the IAM Identity Center users, complete the following steps:
- On the IAM Identity Center console, choose Users in the navigation pane.
- Choose Add user.
- Create IAM Identity Center users that mirror your Google Workspace users by entering the required user information.
- Accept the IAM Identity Center invitation sent through email to each new business user and set each business user’s IAM Identity Center password.
- On the HAQM Q Business console, navigate to the application with the Google Drive data source.
- Choose Manage user access.
- Choose Add groups and users, select Assign existing users and groups, and choose Next.
- Assign users to the HAQM Q application, choose Assign, and choose Confirm if each business user is subscribed to Q Business Pro.
After you add IAM Identity Center users to your HAQM Q application, its web experience URL will appear in the Q Business applications list. You can use the URL to connect to the HAQM Q web experience with either of your Google business users. By default, each user can only ask questions about documents in their Google Drive.
Run sample queries in HAQM Q
To test the HAQM Q application with the HAQM annual reports you uploaded to Google Drive, complete the following steps:
- On the HAQM Q Business console, navigate to the data source you created.
- Run an on-demand sync of the data source by choosing Sync now.
- Navigate to the web experience URL in a new private browser window and log in as the first business user.
- Ask HAQM Q a question, such as how many employees work at HAQM.
The source documents should be the HAQM 2020 and 2021 annual reports, assuming the first business user uploaded the HAQM 2020 annual report and the second business user shared the HAQM 2021 annual report with the first business user.
- Navigate to the web experience URL in a new private browser window and log in as the second business user.
- Ask HAQM Q the same question (how many employees work at HAQM).
The source documents should be the HAQM 2021 and 2022 annual reports.
Troubleshooting
In this section, we share some common issues and troubleshooting tips.
IAM Identity Center login error
You might receive an error on the IAM Identity Center login page that says “We couldn’t verify your sign-in credentials.”
To troubleshoot, complete the following steps:
- Confirm that the business users that mirror the Google Workspace users were created in IAM Identity Center.
- If the users exist, navigate to the user in IAM Identity Center and choose Reset password, then select Generate a one-time password and share the password with the user.
A password will be provided for login and the user will be asked to change their password after a successful login.
Google Drive data source crawling or indexing failure
If the Google Drive data source crawling or indexing fails, complete the following steps:
- Confirm the business users provisioned in the Google Workspace are members of the Google groups.
- Inspect the HAQM CloudWatch logs for the last time the Google Drive data source was crawled for users with Google Drive files in the Google Workspace.
- If the crawler didn’t successfully log the indexing of an expected user’s files, check the IAM Identity Center users, then compare the attributes in the Secrets Manager secret to the corresponding Google Workspace attributes, including client ID, service account email, and service account private key.
- Use the HAQM Q Business document-level sync reports to confirm the intended Google Drive documents were indexed by HAQM Q.
Google Drive data source crawling and indexing job doesn’t crawl and index documents
If the Google Drive data source crawling and indexing job doesn’t crawl and index any documents, complete the following steps:
- Confirm the business users provisioned in the Google Workspace are members of the Google groups.
- Confirm there are IAM Identity Center users that mirror the Google Workspace users.
- Confirm both IAM Identity Center users subscribe to Q Business Pro.
- Confirm the Google Workspace admin user has enabled the Google Drive API.
HAQM Q web experience doesn’t return expected answers from the expected source
If the HAQM Q web experience doesn’t return expected answers from the expected source, complete the following steps:
- Upload the expected source document into an HAQM Q Business chat session by choosing the paperclip icon in the HAQM Q chat interface and then choosing the file.
After you upload the document into the session, if the expected answers are generated from the expected document, the document wasn’t successfully indexed from the Google Drive data source.
- If HAQM Q doesn’t return the expected answer for the uploaded document, modify the prompt used to ask the question.
Clean up
To prevent incurring additional costs, it’s essential to clean up and remove any resources created during the implementation of this solution. Specifically, you should delete the HAQM Q application, which will consequently remove the associated index and data connectors. However, any Secrets Manager secrets created during the HAQM Q application setup process need to be removed separately. Failing to clean up these resources may result in ongoing charges, so it’s crucial to take the necessary steps to completely remove all components related to this solution.
Complete the following steps to delete the HAQM Q application, secret, and IAM Identity Center users in your AWS account:
- On the HAQM Q Business console, choose Applications in the navigation pane.
- Select the application that you created and on the Actions menu, choose Delete and confirm the deletion.
- On the Secrets Manager console, choose Secrets in the navigation pane.
- Select the secret that was created for the Google Drive connector and on the Actions menu, choose Delete.
- Specify the waiting period as 7 days and choose Schedule deletion.
- On the IAM Identity Center console, choose Users in the navigation pane.
- Select the two users that you created and choose Delete users to remove these users.
Additionally, you should remove the business users added to your Google Workspace during the implementation of this solution because Google Workspaces costs are billed on a per-user basis.
Conclusion
In this post, you created an HAQM Q application that indexed Google Drive documents using the Google Drive connector. You were able to connect to the HAQM Q conversational interface as each of your business users and ask questions about the documents each user could access in accordance with the ACL.
You can continue to experiment by adding more PDF documents to your business users’ Google Drives and re-syncing your HAQM Q Google Drive data source.
HAQM Q Business offers other connectors, such as for Confluence Cloud. To learn more about the HAQM Q Business Confluence Cloud connector, refer to Connecting Confluence (Cloud) to HAQM Q Business.
About the Authors
Glen Ireland is a Senior Enterprise Account Engineer at AWS in the Worldwide Public Sector. Glen’s areas of focus include empowering customers interested in building generative AI solutions using HAQM Q.
Julia Hu is a Specialist Solutions Architect who helps AWS customers and partners build generative AI solutions using HAQM Q Business on AWS. Julia has over 4 years of experience developing solutions for customers adopting AWS services on the forefront of cloud technology.