AWS Machine Learning Blog
HAQM Personalize can now use 10X more item attributes to improve relevance of recommendations
March 2025: This blog post was reviewed and updated for accuracy
HAQM Personalize is a machine learning service which enables you to personalize your website, app, ads, emails, and more, with custom machine learning models which can be created in HAQM Personalize, with no prior machine learning experience. AWS is pleased to announce that HAQM Personalize now supports ten times more item attributes for modeling in Personalize. Previously, you could use up to five item attributes while building an ML model in HAQM Personalize. This limit is now 50 attributes. You can now use more information about your items, for example, category, brand, price, duration, size, author, year of release etc., to increase the relevance of recommendations.
In this post, you learn how to add item metadata with custom attributes to HAQM Personalize and create a model using this data and user interactions. This post uses the HAQM customer reviews data for beauty products. For more information and to download this data, see HAQM Customer Reviews Dataset. We will use the history of what items the users have reviewed along with user and item metadata to generate product recommendations for them.
Pre-processing the data
To model the data in HAQM Personalize, you need to break it into the following datasets:
- Users – Contains metadata about the users
- Items – Contains metadata about the items
- Interactions – Contains interactions (for this post, reviews) and metadata about the interactions
For each respective dataset, this post uses the following attributes:
- Users –
customer_id
,helpful_votes
, andtotal_votes
- Items –
product_id
,product_category
, andproduct_parent
- Interactions –
product_id
,customer_id
,review_date
, andstar_rating
This post does not use the other attributes available, which include marketplace
, review_id
, product_title
, vine
, verified_purchase
, review_headline
, and review_body
.
Additionally, to conform with the keywords in HAQM Personalize, this post renames customer_id
to USER_ID
, product_id
to ITEM_ID
, and review_date
to TIMESTAMP
.
To make getting started easier, you can use AWS CloudShell to experiment with this procedure. To do this choose a region using the AWS Regional Services List that supports both AWS CloudShell and HAQM Personalize. If you are not using CloudShell, be sure your environment includes the AWS CLI.
To download and process the data for input to HAQM Personalize, use the following example code blocks. The Python code blocks assume Python3 will be used.
#Downloading data
#If using AWS CloudShell, use the /tmp directory for more space to work
cd /tmp
aws s3 cp s3://amazon-reviews-pds/tsv/amazon_reviews_us_Beauty_v1_00.tsv.gz .
gunzip amazon_reviews_us_Beauty_v1_00.tsv.gz
#Adding Pandas package
pip3 install pandas
For the Users dataset, enter the following code:
#Generating the user dataset
import pandas as pd
fields = ['customer_id', 'helpful_votes', 'total_votes']
df = pd.read_csv('amazon_reviews_us_Beauty_v1_00.tsv', sep='\t', usecols=fields)
df = df.rename(columns={'customer_id':'USER_ID'})
df.to_csv('User_dataset.csv', index = None, header=True)
The following screenshot shows the Users dataset. This output can be generated by
df.head()
Delete the User dataset dataframe to free up memory by running del [df]
.
For the Items dataset, enter the following code:
#Generating the item dataset
import pandas as pd
fields = ['product_id', 'product_category', 'product_parent']
df1 = pd.read_csv('amazon_reviews_us_Beauty_v1_00.tsv', sep='\t', usecols=fields)
df1= df1.rename(columns={'product_id':'ITEM_ID'})
#Clip category names to 999 characters to confirm to Personalize limits
maxlen = 999
for index, row in df1.iterrows():
product_category = row['product_category'][:maxlen]
df1.at[index, 'product_category'] = product_category
# End of for loop - hit enter here if running interactive mode
df1.to_csv('Item_dataset.csv', index = None, header=True)
The following screenshot shows the Items dataset. This output can be generated by
df1.head()
Delete the Items dataset dataframe to free up memory by running del [df1]
.
For the Interactions dataset, enter the following code:
#Generating the interactions dataset
import pandas as pd
from datetime import datetime
fields = ['product_id', 'customer_id', 'review_date', 'star_rating']
df2 = pd.read_csv('amazon_reviews_us_Beauty_v1_00.tsv', sep='\t', usecols=fields)
#Note that you can ignore the "...DtypeWarning..." message if you are running this process in CloudShell
df2= df2.rename(columns={'product_id':'ITEM_ID', 'customer_id':'USER_ID', 'review_date':'TIMESTAMP'})
#Converting timstamp to UNIX timestamp and rounding milliseconds
num_errors =0
for index, row in df2.iterrows():
time_input= row["TIMESTAMP"]
try:
time_input = datetime.strptime(time_input, "%Y-%m-%d")
timestamp = round(datetime.timestamp(time_input))
df2.at[index, "TIMESTAMP"] = timestamp
except:
print("exception at index: {}".format(index))
num_errors += 1
# End of for loop - hit enter here if running interactive mode
# You should receive a series of "exception at index..." outputs
print("Total rows in error: {}".format(num_errors))
df2.to_csv("Interaction_dataset.csv", index = None, header=True)
The following screenshot shows the Interactions dataset. This output can be generated by
df2.head()
If using interactive mode, quit python3 and return to the bash shell by running quit()
.
Uploading the data
Note that if your session to CloudShell is lost at any point in the procedure, work can resume by pulling previously set variables from persistent file by running the Bash command “source ~/local_variables.txt”
Also note that CloudShell is a regional instance, so make sure you are logging back into CloudShell in the same region that you started.
After Pre-processing has been completed, upload the data to your HAQM S3 bucket. Be sure to replace <your_bucket_name_here> with a globally unique S3 bucket name while observing S3 bucket naming rules.
demo_bucket_name="<your_bucket_name_here>"\
&&echo demo_bucket_name=$demo_bucket_name \
>> ~/local_variables.txt
demo_key_prefix="train/demo"\
&&echo demo_key_prefix=$demo_key_prefix \
>> ~/local_variables.txt
aws s3 mb s3://$demo_bucket_name
aws s3api put-object \
--bucket $demo_bucket_name \
--key "${demo_key_prefix}/users/user_dataset.csv" \
--body User_dataset.csv
aws s3api put-object \
--bucket $demo_bucket_name \
--key "${demo_key_prefix}/items/item_dataset.csv" \
--body Item_dataset.csv
aws s3api put-object \
--bucket $demo_bucket_name \
--key "${demo_key_prefix}/interactions/interaction_dataset.csv" \
--body Interaction_dataset.csv
Ingesting the data
After you process the preceding data, you can ingest it in HAQM Personalize.
Creating a dataset group
To create a dataset group to store events (user interactions) sent by your application and the metadata for users and items, complete the following commands:
dataset_group_name="demo-dataset"\
&&echo dataset_group_name=$dataset_group_name \
>> ~/local_variables.txt
aws personalize create-dataset-group \
--name $dataset_group_name
dataset_group_arn=$(aws personalize list-dataset-groups \
--query 'datasetGroups[?name==`demo-dataset`].datasetGroupArn' \
--output=text)\
&&echo dataset_group_arn=$dataset_group_arn \
>> ~/local_variables.txt
Creating a dataset and defining schema
After you create the dataset group, create a dataset and define schema for each of them. The following commands are for your three datasets:
Create schemas for Items, Users, and Interactions:
# Create the Items Schema
aws personalize create-schema \
--name 'demo-items-schema' \
--schema ' {
"type": "record",
"name": "Items",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "product_parent",
"type": "string",
"categorical": true
},
{
"name": "product_category",
"type": "string",
"categorical": true
}
],
"version": "1.0"
}'
items_schema_arn=$(aws personalize list-schemas \
--query 'schemas[?name==`demo-items-schema`].schemaArn' \
--output text)\
&&echo items_schema_arn=$items_schema_arn \
>> ~/local_variables.txt
# Create the Users Schema
aws personalize create-schema \
--name 'demo-users-schema' \
--schema ' {
"type": "record",
"name": "Users",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "helpful_votes",
"type": "float"
},
{
"name": "total_votes",
"type": "float"
}
],
"version": "1.0"
}'
users_schema_arn=$(aws personalize list-schemas \
--query 'schemas[?name==`demo-users-schema`].schemaArn' \
--output text)\
&&echo users_schema_arn=$users_schema_arn \
>> ~/local_variables.txt
# Create the Interactions Schema
aws personalize create-schema \
--name 'demo-interactions-schema' \
--schema ' {
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "star_rating",
"type": "string",
"categorical": true
},
{
"name": "TIMESTAMP",
"type": "long"
}
],
"version": "1.0"
}'
interactions_schema_arn=$(aws personalize list-schemas \
--query 'schemas[?name==`demo-interactions-schema`].schemaArn' \
--output text)\
&&echo interactions_schema_arn=$interactions_schema_arn \
>> ~/local_variables.txt
Create the datasets for Items, Users, and Interactions:
# Create Items datasets
aws personalize create-dataset \
--name "demo-items" \
--schema-arn $items_schema_arn \
--dataset-group-arn $dataset_group_arn \
--dataset-type Items
items_dataset_arn=$(aws personalize list-datasets \
--query 'datasets[?name==`demo-items`].datasetArn' \
--output=text)\
&&echo items_dataset_arn=$items_dataset_arn \
>> ~/local_variables.txt
# Create Users datasets
aws personalize create-dataset \
--name "demo-users" \
--schema-arn $users_schema_arn \
--dataset-group-arn $dataset_group_arn \
--dataset-type Users
users_dataset_arn=$(aws personalize list-datasets \
--query 'datasets[?name==`demo-users`].datasetArn' \
--output=text)\
&&echo users_dataset_arn=$users_dataset_arn \
>> ~/local_variables.txt
# Create Interactions datasets
aws personalize create-dataset \
--name "demo-interactions" \
--schema-arn $interactions_schema_arn \
--dataset-group-arn $dataset_group_arn \
--dataset-type Interactions
interactions_dataset_arn=$(aws personalize list-datasets \
--query 'datasets[?name==`demo-interactions`].datasetArn' \
--output=text)\
&&echo interactions_dataset_arn=$interactions_dataset_arn \
>> ~/local_variables.txt
Importing the data
After you create the dataset, import the data from HAQM S3. To import your Items data, complete the following commands.
Set up policies and roles to allow S3 and Personalize interactions:
# Create IAM Execution Role for Personalize service to read data from bucket
personalize_iam_policy_name="Demo-Personalize-ExecutionPolicy"\
&&echo personalize_iam_policy_name=$personalize_iam_policy_name \
>> ~/local_variables.txt
personalize_iam_role_name="Demo-Personalize-ExecutionRole"\
&&echo personalize_iam_role_name=$personalize_iam_role_name \
>> ~/local_variables.txt
personalize_managed_iam_service_policy_arn=\
"arn:aws:iam::aws:policy/service-role/HAQMPersonalizeFullAccess"\
&&echo personalize_managed_iam_service_policy_arn=\
$personalize_managed_iam_service_policy_arn \
>> ~/local_variables.txt
printf -v personalize_iam_policy_json '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::%s",
"arn:aws:s3:::%s/*"
]
}
]
}' "$demo_bucket_name" "$demo_bucket_name"
aws iam create-policy \
--policy-name $personalize_iam_policy_name \
--policy-document "$personalize_iam_policy_json"
personalize_iam_policy_arn=$(aws iam list-policies \
--query 'Policies[?PolicyName==`Demo-Personalize-ExecutionPolicy`].Arn' \
--output text)\
&&echo personalize_iam_policy_arn=$personalize_iam_policy_arn \
>> ~/local_variables.txt
aws iam create-role \
--role-name $personalize_iam_role_name \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}'
personalize_iam_role_arn=$(aws iam list-roles \
--query 'Roles[?RoleName==`Demo-Personalize-ExecutionRole`].Arn' \
--output text)\
&&echo personalize_iam_role_arn=$personalize_iam_role_arn \
>> ~/local_variables.txt
aws iam attach-role-policy \
--role-name $personalize_iam_role_name \
--policy-arn $personalize_iam_policy_arn
aws iam attach-role-policy \
--role-name $personalize_iam_role_name \
--policy-arn $personalize_managed_iam_service_policy_arn
# Create S3 bucket policy and attach to bucket for Personalize to access S3
printf -v s3_bucket_policy_json '{
"Version": "2012-10-17",
"Id": "PersonalizeS3BucketAccessPolicy",
"Statement": [
{
"Sid": "PersonalizeS3BucketAccessPolicy",
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::%s",
"arn:aws:s3:::%s/*"
]
}
]
}' "$demo_bucket_name" "$demo_bucket_name"
aws s3api put-bucket-policy \
--bucket $demo_bucket_name \
--policy "$s3_bucket_policy_json"
Create dataset import jobs:
# Create dataset import jobs
aws personalize create-dataset-import-job \
--role-arn $personalize_iam_role_arn \
--job-name "demo-initial-items-import" \
--dataset-arn $items_dataset_arn \
--data-source "dataLocation=s3://${demo_bucket_name}/${demo_key_prefix}/items/item_dataset.csv"
aws personalize create-dataset-import-job \
--role-arn $personalize_iam_role_arn \
--job-name "demo-initial-users-import" \
--dataset-arn $users_dataset_arn \
--data-source "dataLocation=s3://${demo_bucket_name}/${demo_key_prefix}/users/user_dataset.csv"
aws personalize create-dataset-import-job \
--role-arn $personalize_iam_role_arn \
--job-name "demo-initial-interactions-import" \
--dataset-arn $interactions_dataset_arn \
--data-source "dataLocation=s3://${demo_bucket_name}/${demo_key_prefix}/interactions/interaction_dataset.csv"
Check status of the dataset import jobs. This may take several minutes.
# Check status of dataset import jobs for "ACTIVE" status before proceeding
aws personalize list-dataset-import-jobs \
--query 'datasetImportJobs[?jobName==`demo-initial-items-import`].[jobName, status]' \
--output text&&\
aws personalize list-dataset-import-jobs \
--query 'datasetImportJobs[?jobName==`demo-initial-users-import`].[jobName, status]' \
--output text&&\
aws personalize list-dataset-import-jobs \
--query 'datasetImportJobs[?jobName==`demo-initial-interactions-import`].[jobName, status]' \
--output text
# Once status of dataset import jobs are all "ACTIVE," proceed to next step.
Training a model
After you ingest the data into HAQM Personalize, you are ready to train a model (solutionVersion
). To do so, map the recipe (algorithm) you want to use to your use case. The following are your available options:
- For user personalization, such as recommending items to a user, use one of the recipes described in the user personalization recipes documentation pages.
- For recommending items similar to an input item, use SIMS.
- For reranking a list of input items for a given user, use Personalized-Ranking.
This post uses the User-Personalization recipe to define a solution and then train a solutionVersion
(model). Complete the following commands.
# Create solution and train solution version
# Note that the you may need to source saved variables from file:
source ~/local_variables.txt
aws personalize create-solution \
--name demo-user-personalization \
--dataset-group-arn $dataset_group_arn \
--recipe-arn arn:aws:personalize:::recipe/aws-user-personalization
personalize_solution_arn=$(aws personalize list-solutions \
--query 'solutions[?name==`demo-user-personalization`].solutionArn' \
--output text)\
&&echo personalize_solution_arn=$personalize_solution_arn \
>> ~/local_variables.txt
aws personalize create-solution-version \
--solution-arn $personalize_solution_arn \
--training-mode FULL
personalize_solution_version_arn=$(aws personalize describe-solution \
--solution-arn $personalize_solution_arn \
--query 'solution.latestSolutionVersion.solutionVersionArn' \
--output text)\
&&echo personalize_solution_version_arn=$personalize_solution_version_arn \
>> ~/local_variables.txt
You can also change the default hyperparameters or perform hyperparameter optimization for a solution.
Check status of the solution version. This may take an hour or longer as it is running full training on the datasets.
# Check status of solution version training for "ACTIVE" before proceeding
# Note that the you may need to source saved variables from file:
source ~/local_variables.txt
aws personalize describe-solution-version \
--solution-version-arn $personalize_solution_version_arn \
--query 'solutionVersion.[solutionVersionArn, status]'
Getting recommendations
To get recommendations, create a campaign using the solution and solution version you just created. Complete the following steps:
# Create Campaign from solution version
# Note that the you may need to source saved variables from file:
source ~/local_variables.txt
aws personalize create-campaign \
--name demo-user-personalization-test \
--solution-version-arn $personalize_solution_version_arn \
--min-provisioned-tps 1
personalize_campaign_arn=$(aws personalize list-campaigns \
--solution-arn $personalize_solution_arn \
--query 'campaigns[?name==`demo-user-personalization-test`].campaignArn' \
--output text)\
&&echo personalize_campaign_arn=$personalize_campaign_arn \
>> ~/local_variables.txt
Check status of the campaign. This may take several minutes.
# Check status of Campaign for "ACTIVE" status
# Note that the you may need to source saved variables from file:
source ~/local_variables.txt
aws personalize describe-campaign \
--campaign-arn $personalize_campaign_arn \
--query 'campaign.[name, status]'
After you set up the campaign, you can programmatically call the campaign to get recommendations in form of item IDs. You can also use the console to get the recommendations and perform spot checks. Additionally, HAQM Personalize offers the ability to batch process recommendations. For more information, see Now available: Batch Recommendations in HAQM Personalize.
One way to test the campaign is with the following commands that will test both an existing a nonexistent user.
Test Campaign result with random user ID
# Note that the you may need to source saved variables from file:
source ~/local_variables.txt
# Known user from dataset
test_user_id_1="19551372"
# New nonexistent user
test_user_id_2="1955137000"
# View recommendations for known user with previous recorded interactions
aws personalize-runtime get-recommendations \
--campaign-arn $personalize_campaign_arn \
--user-id $test_user_id_1 \
--num-results 5
# View recommendations for new user with no previous recorded interactions
aws personalize-runtime get-recommendations \
--campaign-arn $personalize_campaign_arn \
--user-id $test_user_id_2 \
--num-results 5
You should see the top five ranked item IDs for this user in descending order.
Filtering recommendations
With filters, you can exclude specific items, such as previously purchased products, or limit recommendations to particular categories, price ranges, or content types. This targeted approach helps ensure your users receive relevant recommendations.
Updating Datasets to Maintain Recommendation Relevance
In today’s fast-paced digital marketplace, ensuring your recommendations stay relevant is crucial for customer engagement and conversion. HAQM Personalize offers powerful tools to keep your recommendation models up-to-date without the need for constant retraining. Let’s discuss strategies for ensuring the ongoing accuracy and effectiveness of your personalized recommendations.
Why Update Your Datasets?
As user preferences evolve and new items are added to your catalog, your recommendation models need to adapt. Regular updates to your datasets allow HAQM Personalize to capture these changes, ensuring that your recommendations remain current and effective.
Your recommendation models in HAQM Personalize adapt as user preferences change and new items enter your catalog. Keeping your datasets current helps deliver relevant recommendations that reflect the latest user interests and available items.
Strategies for Keeping Recommendations Current
HAQM Personalize offers two methods to update your recommendation datasets:
- Real-Time Event Tracking
Record customer interactions as they happen to power dynamic recommendations. When you implement real-time event tracking, HAQM Personalize instantly captures user actions—including clicks, views, and purchases – then automatically refines recommendations to match current interests. This approach helps you deliver relevant experiences that adapt to changing customer preferences.
To implement real-time event tracking, see Recording real-time events in the HAQM Personalize Developer Guide.
- Periodic Dataset Imports
Update your recommendation engine according to your schedule by importing your latest data in batches. With periodic imports, you can upload user interactions, product catalogs, and user profiles at intervals that suit your business needs. This method works well for businesses whose recommendations rely primarily on historical patterns.
To learn more about batch updates, see Updating data in datasets in the HAQM Personalize developer guide.
Choosing the Right Strategy
Still wondering which tracking method is right for you? Think about your business needs this way: Real-time tracking shines when your customers expect instant, dynamic recommendations based on their current behavior. It’s your best choice when immediate responsiveness drives your customer experience, such as in e-commerce or streaming platforms.
Periodic imports work well when your recommendations rely primarily on historical patterns and stable user preferences. This approach makes sense if your user behavior data changes gradually or when you want to optimize your resource usage.
Remember: Your choice shapes how HAQM Personalize delivers personalized experiences to your customers, directly affecting customer engagement and business outcomes.
Removal of Created Resources
If you would like to remove the resources that you created in this post, run the following commands:
# Note that the you may need to source saved variables from file:
source ~/local_variables.txt
#Delete the campaign
aws personalize delete-campaign --campaign-arn $personalize_campaign_arn
#WAIT FOR DELETION
#Delete the solution
aws personalize delete-solution --solution-arn $personalize_solution_arn
#Delete datasets
aws personalize delete-dataset --dataset-arn $items_dataset_arn
aws personalize delete-dataset --dataset-arn $users_dataset_arn
aws personalize delete-dataset --dataset-arn $interactions_dataset_arn
#Delete the dataset group
aws personalize delete-dataset-group --dataset-group-arn $dataset_group_arn
#Delete schemas
aws personalize delete-schema --schema-arn $items_schema_arn
aws personalize delete-schema --schema-arn $users_schema_arn
aws personalize delete-schema --schema-arn $interactions_schema_arn
#Delete the S3 bucket and contents
aws s3 rb s3://$demo_bucket_name --force
#Delete IAM Role and Policy
aws iam detach-role-policy --role-name $personalize_iam_role_name --policy-arn $personalize_managed_iam_service_policy_arn
aws iam detach-role-policy --role-name $personalize_iam_role_name --policy-arn $personalize_iam_policy_arn
aws iam delete-role --role-name $personalize_iam_role_name
aws iam delete-policy --policy-arn $personalize_iam_policy_arn
#To clear the file saving variables
rm -f ~/local_variables.txt
Conclusion
You can now use these recommendations to create personalized experiences across your platforms. For example, customize your beauty website homepage based on individual user preferences, or enhance your promotional emails with targeted product suggestions. Get started with HAQM Personalize today!
About the authors
Vaibhav Sethi is the Product Manager for HAQM Personalize. He focuses on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys hiking and reading.
Brian Soper is a Solutions Architect at HAQM Web Services helping AWS customers transform and architect for the cloud since 2018. Brian has a 20+ year background building out physical and virtual infrastructure for both on-premises and cloud.
Rob Percival is an Account Manager in the AWS Games organization. He works with operators, game developers, and software providers in the US Real Money Gaming (online sports betting and casino gambling) industry to increase speed to market, gain deeper insight on their players, and accelerate experimentation and innovation using AWS.
Shashank Chinchli is a Solutions Architect at AWS. With over 13 years of industry experience in cloud architecture, systems engineering, and application development, he specializes in helping businesses build efficient and scalable solutions on AWS. Notably, he has earned 13 AWS certifications and a Golden Jacket, reflecting his exceptional expertise. Outside of work, Shashank enjoys staying active with HIIT workouts and exploring his passion for music.
Rohit Raj is a Solution Architect at AWS, specializing in Serverless and a member of the Serverless Technical Field Community. He continually explores new trends and technologies. He is passionate about guiding customers build highly available, resilient, and scalable solutions on cloud. Outside of work, he enjoys travelling, music, and outdoor sports.