AWS Cloud Operations Blog

Exporting a subset of AWS CloudTrail Lake events to HAQM S3

Introduction

Monitoring and managing your AWS environment is critical to maintaining security and operational excellence. With the availability of AWS CloudTrail Lake data for zero-ETL analysis in HAQM Athena, you can use Athena to query your activity logs in CloudTrail Lake without the operational complexity of moving data or building data processing pipelines. CloudTrail Lake is a managed data lake that lets you aggregate, immutably store, and analyze activity logs for audit, security, and operational investigations. Athena is an interactive query service that makes it simple to analyze data in HAQM S3 and other data stores using SQL. Using Athena, security engineers can correlate activity logs in CloudTrail Lake with application and traffic logs in data stores such as HAQM S3 for security incident investigations.

AWS CloudTrail offers a robust logging and continuous monitoring solution for account activity. Reviewing all CloudTrail log data to identify relevant events can be time-consuming. For example, consider a scenario where an organization wants to monitor changes to their S3 buckets for security and compliance purposes. They can set up a CloudTrail to capture all events and then create a mechanism to export only S3 change events to an HAQM S3 bucket. This subset can then be monitored, analyzed and stored for compliance reporting or to ingest CloudTrail data to a third-party. Exporting a subset of CloudTrail events to an HAQM S3 bucket enables focused analysis of specific activities, streamlining event management.

In this blog post, we will walk you through the process of exporting a filtered set of CloudTrail Lake events to an HAQM S3 bucket. There may be a certain use case where you only want to ingest a subset of CloudTrail Lake events to a third-party whether you’re focusing on security-related activities, compliance requirements or operational monitoring this step-by-step tutorial will help you harness the full potential of CloudTrail Lake without drowning in data. Join us as we delve into the prerequisites, configurations and the mechanism to make your AWS environment more manageable and secure. Also, Native AWS services provide centralized management through a single console, eliminating the complexity of multiple interfaces and streamlining operations.

Overview of the Solution

Export a subset of filtered events within CloudTrail Lake to an HAQM S3 bucket in parquet format

Figure 1: Export a subset of filtered events within CloudTrail Lake to an HAQM S3 bucket in parquet format

Deploy the solution in an AWS account where you have configured your S3 buckets to store your CloudTrail logs. The resources are deployed using CloudFormation template, which creates an Event Bridge scheduled event that triggers an AWS Lambda function to run an Athena query. It defines parameters for scheduling, Athena configuration and S3 bucket details. The template creates resources including an EventBridge rule, IAM role for Lambda execution with necessary permissions, and a Lambda function. The Lambda function reads a custom SQL query from an S3 bucket, combines it with a create table statement, executes the query in Athena, and stores the results in a specified S3 bucket. The function also manages execution time tracking using AWS Systems Manager Parameter Store. The template provides a comprehensive setup for automating scheduled Athena queries with proper IAM permissions and error handling.

1. EventBridge Rule: Schedules a job to trigger the Lambda function at defined intervals.

2. Lambda Function:

  • Executes the Athena query.
  • Convert the results to Parquet format.
  • Save the results to the HAQM S3 bucket.

3. Athena Database and Table: Stores the organizational CloudTrail Lake event data.

4. HAQM S3 Bucket: Stores the Parquet format output from the Athena query.

CloudFormation Template: Defines all resources required including Athena workgroup, Athena query execution, HAQM S3 bucket, Lambda function and EventBridge rule.

Pre-requisites

· Create an event data store with CloudTrail Lake query federation enabled. This option would allow the ability to query against your event data using Athena

· Please make a note of the Event Data Store ID

Walkthrough

Download the CFN template DeployResources.yaml

1. Navigate to the CloudFormation > Stack from your AWS Console and select “Create Stack with new resources”

Figure 2: Create Stack with new resources & Alt-text: Create Stack with new resources

Figure 2: Create Stack with new resources & Alt-text: Create Stack with new resources

2. Put in the following parameters in the CFN template

a) “Specify Stack details” by providing the stack name

b) Account number & region for “Athena Query Output Location

c) Provide the “Event Data Store ID” – captured in the prerequisites

Figure 3: Stack details

Figure 3: Stack details

Storage configuration will be auto-filled

Figure 4: These Parameters are auto filled out

Figure 4: These Parameters are auto filled out

3. Leave the default settings for the “configure stack options

4. Review and create by acknowledging the IAM resources creation

Figure 5: Review and Submit

Figure 5: Review and Submit

The demo.sql in this example has the query to filter the logs with eventName = ‘GetBucketACL’ to export only the ‘GetBucketACL’ action performed on the HAQM S3 buckets. This can be customized to any query based on the requirement. Some example queries are mentioned at the end of the “Walkthrough” section.

1. Upload the demo.sql file to the “Custom Query Bucket name” bucket

Testing the Solution

You can also test the query execution instead of waiting for the cron job to run by creating a “Test” event in the Lambda function by browsing to the Lambda function deployed by the CFN stack.

Figure 6: Test Lambda Function

Figure 6: Test Lambda Function

Capture the execution ID and check in Athena too for successful execution.

Figure 7: Validate the execution output in Athena

Figure 7: Validate the execution output in Athena

1. Navigate to HAQM Athena > Query Editor , in the recent queries you will see the Execution ID captured in the above step

Figure 8: Athena query editor

Figure 8: Athena query editor

2. You will see the temp_table_<time-stamp> created after the query execution from the “HAQM Athena” console.

Figure 9: Temp table view

Figure 9: Temp table view

3. CloudTrail logs filtered output specific to the query given in demo.sql can be found under the temp table of the default Database as shown below. In this example we are looking at “GetBucketACL”

Figure 10: GetBucketACL example output

Figure 10: GetBucketACL example output

Sample Queries

You can provide more SQL queries based on the data you would like to export. Few more examples to change your demo.sql file are here below.

1. To filter the logs only for the Event Source type “dynamodb

SELECT *
FROM "aws:cloudtrail"."<EventDataStore_ID>" WHERE recipientaccountid = '469097113486' and
cast(eventtime as TIMESTAMP) > cast('2024-06-01 12:11:22.000' as TIMESTAMP)
and cast(eventtime as TIMESTAMP) < cast('2024-06-04 12:11:22.000' as TIMESTAMP)
and eventSource = 'dynamodb.aws.com'

2. Filter all the logs “for the list of users who have turned off multi-factor authentication”

SELECT userIdentity.arn,
    userIdentity.userName,
    userIdentity.accountId,
    useridentity.principalId
FROM "aws:cloudtrail"."<EventDataStore_ID>"
WHERE eventSource = 'iam.aws.com'
    AND eventName in ('DeactivateMFADevice', 'DeleteVirtualMFADevice')
    cast(eventtime as TIMESTAMP) > timestamp '{start_time}' AND cast(eventtime as TIMESTAMP) < timestamp 
GROUP BY userIdentity.arn,
    userIdentity.userName,
    userIdentity.accountId,
    useridentity.principalId

Cleaning up

To avoid incurring future charges, you would want to delete the resources created in this demonstration, including the IAM policies, IAM Roles, CloudFormation stack, CloudTrail Lake data store.

Conclusion

In summary, this solutions demonstrates a practical solution for granular CloudTrail API monitoring that overcomes the traditional challenges of bulk log analysis. By implementing targeted event filtering and efficient query mechanisms, this approach enables precise tracking of specific API calls across your AWS infrastructure. The solution dramatically improves operational visibility by allowing teams to isolate and analyze individual events while maintaining access to complete contextual information. This streamlined solution significantly reduces the time and effort required for API activity monitoring, making it an essential tool for cloud architect, security professional or DevOps engineer, this method empowers you to gain deeper visibility and control over your AWS operations.

Call to Action

Ready to simplify your AWS CloudTrail Lake logs analysis? Don’t wait until a security incident forces your hand. With CloudTrail Lake and HAQM Athena integration, you now have the power to query activity logs without complex ETL pipelines, focus on specific security or audit events that matter most. Start implementing these powerful filtering techniques today to gain immediate visibility into you AWS CloudTrail events.

To learn more and get started, please refer to the following resources:

Understanding CloudTrail Events

CloudTrail Event Data Store

Collection of sample queries for AWS CloudTrail Lake

About the Authors

Ravindra Kori

Ravindra Kori is a Sr. Solutions Architect and GenAI ambassador at AWS based in Arlington, specializing in Cloud Operations and Serverless technologies. He works extensively with Enterprise and Startup segments, architecting solutions and facilitating AWS modernization and migrations. Outside of work, he finds joy in playing drums and spending quality time with family.

Anjani Reddy

Anjani is a Sr. Solutions Architect at AWS. She works with Enterprise customers and provides technical guidance to help them innovate and build a secure, scalable cloud on the AWS platform. Outside of work, she is an Indian classical and salsa dancer, loves to travel and Volunteers for American Red Cross and Hands on Atlanta.

Isaiah Salinas

Isaiah Salinas is a Senior Specialist Solution Architect with the Cloud Operations Team. With over 10 years of experience working with AWS technology, Isaiah works with customers to design, implement, and support complex cloud infrastructures. He also enjoys talking with others about how to use AWS services to provide solutions to their problems.