Assisting People at Haptik Using HAQM Polly

This is a guest blog post by Swapan Rajdev, Co-Founder & CTO, and Ranvijay Jamwal, Lead DevOps Engineer, at Haptik Inc.

Given the busy lives we all live, our to-do lists keep growing, and it gets harder and harder to keep track of all the things we need to accomplish daily. From remembering our meetings to making sure we buy our next flight ticket, from remembering to drink enough water to making sure we make it to the gym, our lists never end, and their maintenance gets exhausting.

Haptik is India’s first personal-assistant app. Users can use the app to plan travel, check in for flights, book taxis, and set reminders. And of all the different features, the most important and frequently used is the Reminders feature. People use Haptik to set wake-up calls, set up reminders to drink water, call people at different times, send greetings to others for different occasions, and much more. Through the reminders feature, users will receive notifications on the app along with a phone call at a requested time, relating the reminder message.

In this post, we will cover how we use machine learning and text-to-speech (TTS) to set reminders for users – to call them at the given time to remind them of their tasks. We will cover how HAQM Polly helped us make personalized calls to our users and helped us scale our reminders feature to millions of users.

Reminders at Haptik

To get anything done by the personal assistant, the user comes onto the Haptik app and sends the bot a message. Every message in our system goes through a message pipeline in which we try to detect the following:

The domain the user is talking about (reminders, travel, nearby, etc.)
The task (intent) the user wants to get done
The entities (different data required to complete the task of the user)

At the end of this pipeline, if the bot has all the information, it goes ahead and completes the task. Otherwise, it replies back with relevant questions to gather all the information.

Apart from this basic pipeline, we have a lot of other algorithms which use Deep Learning to learn from historical chats to be better able to complete user’s tasks without their intervention.

Why Call Users?

To remind users about their tasks we send them a notification on the app along with a phone call. Although Haptik uses commonly-used notification techniques to remind the users, we believe that calling people works more effectively due to a few reasons:

First, in today’s smartphone age, we are all going through a notification overload from every other app, which leads to missing out on some important notifications. A phone call from an unknown number or Haptik is more effective than a regular alarm which often gets the snooze treatment.

Second, we are able to provide a much better user experience by changing the content and voice of the call based on the type of reminder task. For example, for the morning wake-up calls we send we use a soft and calm voice. Occasionally, we add a motivational quote towards the end of the call to make sure the user wakes up pleasantly and is charged up. Implementing such TTS use-cases is simple, easy, and reliable with HAQM Polly.

How it works

Reminders at Haptik is one of the most complex domains where many different technologies come together to make sure we can make calls to our users in a timely and personalized manner. To successfully set a reminder for the user we capture the following data points from the user:

The reason for setting the reminder (wake-up call, meeting reminder, etc.)
The date of reminder
The time of reminder
If it’s a recurring reminder, what should be the frequency

All of this information is used to derive metadata and is passed to a scheduler whose job is to call the user. The following code snippet shows how a reminder is created:

def create_reminder(user, reminder_task, date, time, repeat_pattern=None):
is_valid = check_reminder_date_time_validity(date, time, repeat_pattern)
	if not is_valid:
		Return False
	notification_content = get_notification_content_for_reminder(user, reminder_Task)
	call_script = get_call_script_for_reminder(user, reminder_task)
	audio_url = generate_audio_using_polly(user, call_script)
 	return schedule_job(user, audio_url, notification_content

Using HAQM Polly for TTS

Before we schedule a reminder, we first fetch the script that we want HAQM Polly to synthesize. For this we have a function that fetches the call script based on the type of reminder and the user.

Def get_call_script_for_reminder(user, reminder_task):
	all_call_scripts = CallScriptStore.objects.values(‘script’).filter(task_name=reminder_task.name)
	Call_script = random.choice(all_call_sciprts)
	return call_script.format(**{user_name: user.name})

Example output:

Rise and shine, Swapan! It's a beautiful day - Time to wake up!

After we have the script, we call HAQM Polly to convert the text to speech and upload the audio file to HAQM S3, which we can use later to play during the call. Use the following code to create the audio file (mp3) and upload it to HAQM S3:

from boto3 import Session

Def generatea_audio_using_polly(user, call_Script):
session = Session()
polly = session.client("polly", region_name=POLLY_REGION)
response = polly.synthesize_speech(Text=content,
				        OutputFormat="mp3",
				         VoiceId=get_polly_voice_for_task(reminder_ask))

with closing(response["AudioStream"]) as stream:
with open(mp3_file_path, "wb") as file:
	file.write(stream.read())

# Upload to S3
s3 = session.client("s3", region_name=AUDIO_BUCKET_REGION)
s3.upload_file(
 mp3_file_path,
  AUDIO_BUCKET_NAME, “file.mp3"
)

url = "http://s3-{0}.amazonaws.com/{1}/{2}/{3}".format(
 AUDIO_BUCKET_REGION,
AUDIO_BUCKET_NAME,
	“file.mp3”)

Return url

At the time of the actual reminder, our scheduler system makes an API call to our calling partner using the mobile number and the URL of the call script. A call is then made to the user during which the call script is played; this completes the reminder. We have received a lot of positive feedback on the content and behavior of the call. On any given day we send out more than 100,000 reminders.

Adding personality using HAQM Polly

Using HAQM Polly, you can generate call scripts in 51 different voices across 25 languages. This helps you provide a breadth of user experiences. In the previous function while generating the call script, we call a function `get_polly_voice_for_task` to generate a voiceId. You can get the different voices supported by HAQM Polly by using the following code:

session = Session()
polly = session.client("polly", region_name=POLLY_REGION)
response = polly.describe_voices()
voice_ids = [item['Id'] for item in response['Voices']]

Since most of our audience is in India, we use “Raveena” (English female Indian voice) frequently because this voice resonates with many of our users.

Sample of a wake-up call reminder

Listen now

Voiced by HAQM Polly

Sample of a birthday greeting

Listen now

Voiced by HAQM Polly

Why HAQM Polly

We have experimented with a number of different services for TTS but HAQM Polly was the frontrunner by miles. Some of the reason why we choose HAQM Polly are:

Speed of development and iteration – The HAQM Polly API is simple and very robust. It took us less than a day to implement the HAQM Polly API calls, and we designed our system in a way where almost all the configurations can be changed on the fly without the need of any code changes. We have a tool to change the call scripts and a tool to change the different voices. This allowed us to experiment and perform A/B testing with a lot of different scripts and voices before we could be satisfied with the experience.

Scalability – Based on the architecture described previously, we create the call script in advance and store it on HAQM S3 so that when we have to make the call the audio is already ready. This helped us to scale and trigger thousands of calls at the same time without hindering the user experience.

Reliability and Monitoring – HAQM provides a lot of great tools to monitor our HAQM Polly requests. We have experienced near 100% reliability and availability till now. We have never faced a downtime with HAQM Polly so far. To be on the safe side, we have created alarms to go off whenever we have more than 5 failed requests in a period of 5 minutes. You can easily set up alarms using HAQM CloudWatch, which we have then synced with PagerDuty.

Latency – HAQM CloudWatch provides some great ways to monitor different metrics of HAQM Polly. If you look at the following graphs, the average time an HAQM Polly audio file gets created in is 17ms for an average for 85 characters per file. This is really fast, and helps us deliver a very good user experience on thousands of concurrent calls.

Conclusion

On an everyday basis, we at Haptik try and make the life of our users easier by providing simple way to get things done. In the future, we plan to add support for multiple languages and the advantage of using HAQM Polly is that it already supports 24 different languages. Apart from that we are always tweaking and playing around with our Machine Learning algorithms to be able to understand more from the users. Along with that we are finding different ways to use technology to help serve our users better. We hope you found this post useful.

Additional Reading

Be sure to read the Haptik Case Study, “Haptik Supports 30% Monthly Increase in App Downloads Using AWS.”

About the Authors

Swapan Rajdev is the Co-Founder & CTO and Ranvijay Jamwal is the Lead DevOps Engineer at Haptik Inc. In their own words, “Haptik is a company that specializes in chatbots, with the flagship product being the Android and iOS personal assistant app with the same name. Other than the consumer app, we also work with enterprises to help build chatbots solutions for customer service, lead generation, marketing and much more.”

AWS Machine Learning Blog