AWS Business Intelligence Blog

Enhance API throttling visibility with HAQM QuickSight

As enterprises scale their workloads on HAQM Web Services (AWS), managing API rate limits becomes a critical challenge. These limits, which vary across services and APIs, define the maximum allowed request rate. Exceeding them triggers throttling, leading to error responses that, if not handled properly, can result in system failures, increased retries, and higher costs.

In the post, Managing and monitoring API throttling in your workloads, we outlined strategies for capturing API usage metrics in HAQM CloudWatch and obtaining throttled API data from AWS CloudTrail. However, consolidating and visualizing this data across multiple AWS accounts and AWS Regions can be complex.

In this post, we show you how to use HAQM QuickSight to create a centralized dashboard that provides comprehensive visibility into API throttling across your organization. By aggregating throttling data from different accounts and Regions, you can better assess the impact and improve your response strategies.

We walk through the step-by-step deployment of this solution, demonstrate how to consolidate throttled API data from multiple accounts and Regions, further analyze the data, and visualize it with a QuickSight dashboard, as shown in the following screenshot. This dashboard centralizes throttling data, including the affected APIs, associated AWS services, accounts, and AWS Identity and Access Management (IAM). This approach simplifies the assessment of throttling impact and improves response strategies.

Solution overview

This solution uses a hub-and-spoke model to capture API throttling data from different accounts and Regions. The hub is the data collection account, which aggregates all throttling data and displays it in a QuickSight dashboard. The spokes are the linked accounts, where throttling events occur. The architecture of this solution is shown in the following diagram.

When an API is throttled in a linked account, CloudTrail records the event as an API call with an errorCode parameter. This parameter specifies the error type, such as throttling, and can vary across APIs. CloudTrail sends management API call data as events to the HAQM EventBridge default event bus, where an EventBridge rule filters and captures only the throttling events. For the purposes of this post, we refer to this as the filtering rule. The following code is an example filtering pattern:

{
	"detail-type":["AWS API Call via CloudTrail"],
	"detail":{
		"errorCode":[
			"Client.RequestLimitExceeded",
			"ThrottlingException"
		]
	}
}

This solution only supports throttled APIs from CloudTrail management events because at the time of writing, CloudTrail doesn’t deliver data events to EventBridge.

The EventBridge rule in each linked account targets a custom event bus in the data collection account. This custom event bus consolidates throttling events from all linked accounts and Regions. Another EventBridge rule, referred to in this post as the collection rule, forwards all collected events to HAQM Data Firehose, which buffers and concatenates the event data before storing it in an HAQM Simple Storage Service (HAQM S3) bucket.

After data is stored in HAQM S3, AWS Glue creates logical groupings of the data, organizing it into databases and tables. QuickSight queries these tables using HAQM Athena and uses the references the results to provide a unified view for visualization and analysis.

Prerequisites

The following prerequisites are required for this solution:

Deploy solution resources in the data collection account

Identify the data collection account that will gather all throttling events and follow the steps in this section to deploy the resources using a CloudFormation stack.

This deployment includes a sample filtering rule for the specified data collection account and the CloudFormation stack deployment Region.

  1. Sign in to the AWS Management Console for the data collection account
  2. Choose
  3. Choose Next
  4. Configure the stack parameters:
    • For Stack name, enter a unique identifier for your CloudFormation stack.
    • For AWSOrganizationID, enter a comma-separated list of AWS Organizations IDs authorized to send events to the custom event bus.
    • For ApiRateDataCollectionDB, enter the name of the AWS Glue database for the API rate data collection. The default is apiratedatacollectiondb.
    • For AthenaResultBucket, enter the name of the S3 bucket for storing Athena query results. The default is aws-athena-query-results-*.
    • For AthenaResultBucketKmsArn, enter the HAQM Resource Name (ARN) of the AWS Key Management Service (AWS KMS) key for encrypting the AthenaResultsBucket. The default is na (no encryption with AWS KMS).
    • For DataCollectionBucket, enter the name of the S3 bucket designated for data collection, as created or identified in the prerequisites.
    • For DataCollectionBucketKmsArn, enter the ARN of the KMS key for encrypting the DataCollectionBucket. The default is na (no encryption with AWS KMS).
    • For ErrorCodeFilter, enter a comma-separated list of errorCode values to filter. Include any additional errorCode patterns to filter. The default is Client.RequestLimitExceeded,ThrottlingException.
    • For GlueSourcePartition, enter the partition information for the AWS Glue Data Catalog. Each service has its own partition; include all relevant service partitions. The default is apirate.test,aws.ec2,aws.ecs,aws.sts,aws.rds,aws.eks,aws.lambda.The apirate.test partition is used specifically for testing purposes in the Verify the solution section later in this post.
    • For QuickSightAnalysisAuthor, enter the QuickSight user with author permissions to create an analysis. QuickSight users are managed under the QuickSight account identity configuration.
    • For QuickSightIdentityRegion, enter the QuickSight identity Region. The identity Region is the Region where users and groups are managed for your QuickSight account. Refer to Exercises for more information.
    • For QuickSightServiceRole, enter the IAM role for QuickSight service access. The default is aws-quicksight-service-role-v0.
    • For ResourcePrefix, enter the prefix for resource names created by this stack. The default is ApiRateLimit-.

Example AWS cloudformation stack parameters

  1. Choose Next.
  2. Select the acknowledgement check box, as shown in the following screenshot. Choose Next.

  1. Review all the stack details and choose Submit.

The deployment process completes in approximately 5 minutes. Upon completion, a QuickSight analysis is created without data to display. To populate the QuickSight analysis, follow the steps in the Verify the solution section.

To collect data from additional accounts and Regions, refer to the next section regarding deploying resources in linked accounts. Record the data collection account number and the current deployment Region, because these will be required during the linked account deployment process.

Deploy resources in linked accounts

Follow these steps to deploy resources to linked accounts and Regions through CloudFormation StackSets:

  1. Sign in to the account with delegated administrator access. This account should have permission to deploy CloudFormation StackSets to linked accounts and Regions.
  2. Open the AWS CloudFormation console.
  3. Download LinkedAccountSetup.yaml and create a CloudFormation StackSet using the downloaded template.
  4. Enter the data collection account number, deployment Region, and the same ErrorCodeFilter value used in the data collection account deployment.

  1. Set deployment options such as entire organizations or specific organizational units (OUs) and specify the Regions of interest, as shown in the following screenshot.
  2. Choose Next
  3. Review the setting and choose Submit.

Verify the solution

You can verify the solution by sending a sample event through EventBridge. This can be done within an account or Region that has the filtering rule configured.

Make sure that apirate.test is included in your GlueSourcePartition inputs for the CloudFormation stack deployment in the data collection account. Note that apirate.test is a custom event source. Users aren’t authorized to send sample events with an AWS event source, for example, aws.sts or aws.ec2.

Follow these steps to send a sample event:

  1. Sign in to the console of any linked account.
  2. On the EventBridge console, choose Event buses under Buses in the navigation pane and choose Send events.

  1. Fill in Event entry 1 with the following fields:
    • For Event source, enter apirate.test.
    • For Detail type, enter AWS API Call via CloudTrail.
    • Enter the following JSON content in the Event detail field:
{
      "eventVersion":"1.10",
      "userIdentity":{
         "type":"AssumedRole",
         "principalId":"ABCDEDCBA",
         "arn":"arn:aws:sts::123456789012/Operator/automation-bot",
         "accountId":"123456789012"
      },
      "eventTime":"2024-10-01T08:45:13Z",
      "eventSource":"lambda.amazonaws.com",
      "eventName":"Invoke",
      "awsRegion":"eu-central-1",
      "sourceIPAddress":"192.0.2.34",
      "userAgent":"Boto3/1.18.53",
      "errorCode":"Client.RequestLimitExceeded",
      "errorMessage":"Request limit exceeded.",
      "requestID":"d1e2f3g4-h5i6-7890-a1b2-c3d4e5f6g7h8"
   }
  1. Choose Send to generate the sample event, as shown in the following screenshot.

  1. Sign in to the data collection account.
  2. On the QuickSight console,Datasets in the navigation pane.
  3. Choose named ApiRateLimit-<AccountID>-<Region> and choose REFRESH NOW, as shown in the following screenshot.

  1. On the QuickSight Analyses page, open the analysis named ApiRateLimit-<AccountID>-<Region>. The sample event should now be visible on the QuickSight dashboard, as shown in the following screenshot.

Clean up

To avoid incurring future charges, go to the CloudFormation console, select the CloudFormation stack or StackSets, and delete them.

Conclusion

By implementing an API throttling dashboard solution, you can gain a clear view of API throttling events across multiple AWS accounts and Regions. This approach helps provide effective throttling impact assessment and response strategies. With insights into the types of APIs being throttled and their frequency, you can adjust application logic, optimize API call rates, and implement more robust retry mechanisms.

Implement this solution to verify and monitor API throttling, and to improve the operational efficiency and cost-effectiveness of enterprise workloads on AWS.

If you have any questions, comments, or suggestions, please leave a comment. You can also visit AWS re:Post.


About the authors

Kanwar Bajwa is a Principal Enterprise Account Engineer at AWS who works with customers to optimize their use of AWS services and achieve their business objectives.

Xiaoxue Xu is a Solutions Architect for AWS based in Toronto. She primarily works with Financial Services customers to help secure their workload and design scalable solutions on the AWS Cloud.

Felicia Ulloa is a Principal Customer Solutions Manager at AWS, focused on helping financial services customers through their cloud transformation and maximize business value from AWS services.