AWS Database Blog

Export and analyze HAQM QLDB journal data using AWS Glue and HAQM Athena

HAQM Quantum Ledger Database (HAQM QLDB) is a fully managed ledger database that maintains a complete, immutable record of every change committed to the database. As transactions are committed to the database, they are appended to a transaction log called a journal and are cryptographically hash-chained to the previous transaction. Once committed, the record of the transaction is permanent and the integrity of the transaction history can be verified with cryptography. This gives you a tamper-evident audit of change history for all documents in the ledger.

HAQM QLDB is optimized for writes and for targeted reads using indexes. It is not optimized for scan operations over large datasets, making it an ideal choice for transaction processing but not for reporting and analytics. In this post, we discuss how you can use the export feature in HAQM QLDB with AWS Glue and HAQM Athena to provide reporting and analytical capabilities to your ledger-based architectures. Optionally, you can use HAQM QuickSight with Athena to build interactive dashboards, paginated reports, embedded analytics, and natural language queries.

Solution overview

The following diagram depicts the architecture of the solution.

The solution uses several AWS services:

  • HAQM Athena – Athena is an interactive query service that makes it easy to analyze data in HAQM Simple Storage Service (HAQM S3) using standard SQL. Athena is serverless, so there is no infrastructure to set up or manage, and you pay only for the queries you run. In this solution, we use Athena to run queries against our transactional data exported from HAQM QLDB.
  • AWS Glue – AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. In this solution, AWS Glue extracts, transforms, and loads from the data source (HAQM S3) into target tables. We also use the following features:
    • AWS Glue Data Catalog – The Data Catalog contains references to data that are used as sources and targets of your extract, transform, and load (ETL) jobs in AWS Glue. To create your data warehouse or data lake, you must catalog this data. The Data Catalog is an index to the location, schema, and runtime metrics of your data. You use the information in the Data Catalog to create and monitor your ETL jobs.
    • AWS Glue crawler – You can use a crawler to populate the Data Catalog with tables. A crawler can crawl multiple data stores in a single run. Upon completion, the crawler creates or updates one or more tables in your Data Catalog. The ETL job reads from and writes to the data stores that are specified in the source and target Data Catalog tables.
  • HAQM S3 – HAQM S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. In this solution, HAQM QLDB exports the transactional data to S3 bucket. The data in this S3 bucket is our data source for AWS Glue.
    • Bucket – A bucket is a container for objects stored in HAQM S3. You can store any number of objects in a bucket and can have up to 100 buckets in your account. Every object is contained in a bucket.
  • HAQM QLDB – HAQM QLDB is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable change history log owned by a central trusted authority. You can use HAQM QLDB to track all application data changes, and maintain a complete and verifiable history of changes over time. For this post, this is the heart of the financial transactional data and single source of truth.
  • HAQM QuickSight (optional) – QuickSight allows everyone in your organization to understand your data by asking questions in natural language, exploring through interactive dashboards, or automatically looking for patterns and outliers powered by ML. You can connect QuickSight to Athena for building paginated reports, embedded analytics and natural language queries.
  • Other data sources (optional) – AWS Glue can connect to other file-based and table-based data stores as well.

Create resources with AWS CloudFormation

To get started, we create our solution resources using an AWS CloudFormation template.

  1. Launch the CloudFormation stack by choosing Launch Stack:
  2. Choose Create stack.The stack creation starts with the status Create in Progress and takes approximately 5 minutes to complete.
  3. Open the Outputs tab and take note of the following resource names:
    • HAQM QLDB ledger walletledger
    • S3 bucket walletledger-s3-export-bucket-xXXXXX

Create a table for your sample data in the HAQM QLDB ledger

To create a table for sample data, complete the following steps:

  1. On the HAQM QLDB console, in the navigation pane, choose Getting Started and scroll down to the Sample application data section.
  2. Under Manual option, choose Open query editor.
  3. In the PartiQL editor, under Choose a ledger, choose walletledger.

  4. In the query editor, enter the following statement, then choose Run:
    CREATE TABLE wallettable

You can refresh the tables section in the left pane to see the table.

Load the sample data in the ledger

Now that the table is created, we’re ready to load sample data. Complete the following steps to load the initial sample data:

  1. In a query editor window, enter the following statement, then choose Run:
    INSERT INTO wallettable
    
    << {
    'email' : 'arnav.desai@dbmail.com',
    'username' : 'arnavd',
    'PreviousBalance' : '0',
    'CurrentBalance' : '34234'
    },
    {
    'email' : 'carlos.salazar@ghmail.com',
    'username' : 'carloss',
    'PreviousBalance' : '0',
    'CurrentBalance' : '873746'
    },
    {
    'email' : 'diego.ramirez@kbmail.com',
    'username' : 'diegor',
    'PreviousBalance' : '0',
    'CurrentBalance' : '354698'
    },
    {
    'email' : 'jane.doe@jhmail.com',
    'username' : 'janed',
    'PreviousBalance' : '0',
    'CurrentBalance' : '84905'
    },
    {
    'email' : 'john.doe@kbmail.com',
    'username' : 'johndh',
    'PreviousBalance' : '0',
    'CurrentBalance' : '994758'
    },
    {
    'email' : 'jorge.souza@mwmail.com',
    'username' : 'jorges',
    'PreviousBalance' : '0',
    'CurrentBalance' : '87458'
    },
    {
    'email' : 'akua.mansa@mhemail.com',
    'username' : 'akuam',
    'PreviousBalance' : '0',
    'CurrentBalance' : '45215'
    },
    {
    'email' : 'martha.rivera@sdemail.com',
    'username' : 'marthar',
    'PreviousBalance' : '0',
    'CurrentBalance' : '254589'
    },
    {
    'email' : 'mary.major@ksemail.com',
    'username' : 'marym',
    'PreviousBalance' : '0',
    'CurrentBalance' : '65847'
    },
    {
    'email' : 'nikki.wolf@jbemail.com',
    'username' : 'nikkiw',
    'PreviousBalance' : '0',
    'CurrentBalance' : '12368'
    } >>
  2. Verify the sample data by running the following statement in a new query window and choosing walletledger in the left pane under Choose a ledger:
    select * from wallettable

Add sample data for advanced analysis

For advanced querying in Athena in later sections, we generate a financial statement of a user (akua.mansa@mhemail.com) wallet account. For that wallet statement to be meaningful, we need to add few more sample transactions updates for the user.

In three separate query editor windows (make sure that walletledger is chosen in the left pane), enter the following statements in sequence, then choose Run. These statements simulate transactions done by the user akua.mansa@mhemail.com with different timestamps in the wallet account.

Use the following code for the first query:

UPDATE wallettable
SET PreviousBalance = 45215, CurrentBalance = 55215
WHERE email = 'akua.mansa@mhemail.com'

Use the following code for the second query:

UPDATE wallettable
SET PreviousBalance = 55215, CurrentBalance = 54234
WHERE email = 'akua.mansa@mhemail.com'

Use the following code for the third query:

UPDATE wallettable
SET PreviousBalance = 54234, CurrentBalance = 69234
WHERE email = 'akua.mansa@mhemail.com'

These query statements update the sample user’s wallet with transaction data that we can use to query the financial transaction record using Athena.

Create an HAQM QLDB export job

To create your export job, complete the following steps:

  1. On the HAQM QLDB console, choose Export in the navigation pane.
  2. Choose Create export job.
  3. In the Journal blocks section, choose the walletledger ledger.
  4. Enter 1900/01/01 and 00:00:00 in the Start date and time (UTC) fields.
  5. The End date and time (UTC) field contains the current date and time. Leave this value as it is.
  6. In the Write journal blocks to S3 section, choose Browse S3.
  7. Choose the S3 bucket you created with AWS CloudFormation (walletledger-s3-export-bucket), then choose Choose.
  8. Change the Export Format field to JSON.
  9. In the Grant service access to S3 section, choose Create and use a new service role.
  10. Ensure that S3 bucket name is populated with the correct S3 bucket. (Take note of the service role name that is created.)
  11. Choose Create export job.

The export job shows as In progress, and takes several minutes to complete.

You can verify the export files by navigating to the HAQM S3 console. In the list of buckets, choose the bucket walletledger-s3-export-bucket. Navigate through the directories in the export bucket and explore the files exported by HAQM QLDB.

Set up an AWS Glue crawler

In this step, we set up the AWS Glue components required to make our HAQM QLDB data in HAQM S3 available for querying via Athena. Complete the following steps to set up our AWS Glue workflow:

  1. On the AWS Glue console, choose Crawlers in the left navigation pane.
  2. Choose Create crawler.
  3. Give the crawler a name, such as qldb_s3_json_crawler.
  4. Choose Next.
  5. Under Data source configuration, for Is your data already mapped to Glue tables, select Not yet.
  6. Under Data sources, choose Add a data source.

  7. Under Data source, select S3.
  8. Under Location of S3 data, browse the S3 bucket you created (walletledger-s3-export-bucket-) and choose it and traverse further.
  9. Choose the current year (for this post, 2022).

  10. Select the current month (for this post, 10), then choose Choose.

  11. Choose Add an S3 data source.

  12. Select the HAQM S3 data source, then choose Next.

  13. Under IAM role, choose Create new IAM role.
  14. Enter a name of the new AWS Identity and Access Management (IAM) role with the existing prefix, for example AWSGlueServiceRole-QLDB-GLUECRAWL-IAM.
  15. Choose Create.
  16. Choose Next.
  17. Under Target Database, choose Add database (the Add database window launches in a new browser tab).
  18. Enter walletledger_glue_qldb_export_db, then choose Create database.
  19. Switch back to the AWS Glue crawler configuration browser tab and refresh the Target database, then select the database you created (walletledger_glue_qldb_export_db).
  20. Under Table name prefix optional, enter qldb_db_table_
  21. Under Advanced options, select Update all new and existing partitions with metadata from the table
  22. Choose Next.
  23. Review the crawler properties on the next page and choose Create crawler.
  24. Select the newly created crawler on the next page and choose Run on the Options tab.

The crawler may take a few minutes to complete. The state will change from Ready, Running, Stopping, and back to Ready when crawler is complete.

Query with Athena

We use Athena to query data in HAQM S3 that was exported from HAQM QLDB as JSON. Because we’re querying the data stored in HAQM S3, our queries don’t impose any load on our HAQM QLDB ledger or interfere with ongoing wallet transactions. We can therefore use HAQM QLDB financial transaction data for analytics-style queries that HAQM QLDB isn’t optimized for.

  1. On the Athena console, under Query Editor in the navigation pane, choose the database you created (walletledger_glue_qldb_export_db). The Tables list is populated, and you will see different tables with prefix qldb_db_table_.
  2. Before proceeding to the next step, verify that your Query Results Location is configured:
    1. In the query editor, choose the Settings tab.
    2. Choose Manage.
    3. Choose Browse S3 under Query result location and encryption and select the bucket walletledger-s3-export-bucket-.
  3. Go back to the editor by choosing Editor.
  4. Choose the options menu (three dots) next to the table name and choose Preview table.

The query runs and return results from the AWS Glue Data Catalog database.

Sample financial transaction queries to run with Athena

Now that you have successfully tested the preview of the table created by the AWS Glue workflow job, your Data Catalog is ready for running advanced queries for analysis as needed. In this section, we provide sample queries that you can run.

Query all transaction statements run on our HAQM QLDB ledger

In your Athena query editor, choose the plus sign to open a new query window. Enter the following statement in the query window and choose Run. Make sure to provide the database name and table name from the data source. The following statement returns all transactions run on HAQM QLDB through our AWS Glue Data Catalog table:

SELECT unnest_statements.starttime,

unnest_statements.statement

FROM "walletledger_glue_qldb_export_db"."table_name", UNNEST(transactioninfo.statements) AS t(unnest_statements)

The following screenshot shows the results from our query.

Query all INSERT statements on the HAQM QLDB ledger

We now run the following query in the query editor, which returns all INSERT statements run on HAQM QLDB:

SELECT unnest_statements.starttime,
unnest_statements.statement
FROM "walletledger_glue_qldb_export_db"."table_name" , UNNEST(transactioninfo.statements) AS t(unnest_statements)
WHERE UPPER(unnest_statements.statement) LIKE 'INSERT%'

The following screenshot shows the results from our query.

Query raw data by creating a table view

In this next query, we create a view in Athena that represents the HAQM QLDB ledger table. The query pulls the data from the database exported table and creates a view in Athena. This allows for analysis on a table structure similar to an HAQM QLDB table.

Run the following statement in a new query window:

CREATE VIEW walletview AS
SELECT DISTINCT listed_revisions.data.email,
listed_revisions.data.username,
listed_revisions.data.PreviousBalance,
listed_revisions.data.CurrentBalance,
listed_revisions.metadata.version,
listed_revisions.metadata.txTime timestamp
FROM "walletledger_glue_qldb_export_db"."table_name"
CROSS JOIN UNNEST(revisions) AS t(listed_revisions)

When the query is complete, you can see the new view created.

Inspect the newly created view with a timestamp of the transactions

This query inspects the newly created view with the following statement:

SELECT * FROM "walletledger_glue_qldb_export_db"."walletview" limit 10;

You can see the results from the walletview.

Generate a wallet statement of financial transactions for a specific user

Now that our view is created, we can generate a financial transaction record for a specific user. In a new query window, run the following query:

SELECT * FROM "walletledger_glue_qldb_export_db"."walletview" WHERE email='akua.mansa@mhemail.com' order by version desc;

The following screenshot shows our results.

The results returned are the financial transactions revisions that happened on the user’s wallet account with different timestamps. The revisions query helps generate a financial statement for all deductions and additions of the funds from the wallet.

Analyze count of financial transactions

Financial services companies use HAQM QLDB to process transactions, authorize payments, and manage account balances. These companies need robust reporting and analytics capabilities. As a result, financial services customers need efficient data lineage and audit capabilities, and must prove the trustworthiness of their data and operations to regulators.

One of the requirements in analysis of financial transactions is to generate the number of transactional queries that have been run on the financial data. The reason this is important is to verify if the transaction completed. For example, payments, deductions, credits made through external applications. Senders or receivers often complain that the money never credited or debited in or from their accounts.

This query returns counts for all the record query types. In a new query window, run the following query:

SELECT UPPER(SPLIT_PART(unnest_statements.statement,' ',1)) AS "Statement type", count(*) AS "Number of Statements"
FROM "walletledger_glue_qldb_export_db"."table_name" , UNNEST(transactioninfo.statements) AS t(unnest_statements)
GROUP BY UPPER(SPLIT_PART(unnest_statements.statement,' ',1))

The following screenshot shows our results.

Clean up

When you’re done testing, delete the resources you created so that you’re no longer billed for them.

  1. Launch AWS CloudShell.
  2. Replace the bucket name in the following command with the bucket created by the CloudFormation stack (walletledger-s3-export-bucket-xXXXXX) and press Enter:
    $ aws s3 rb s3://walletledger-s3-export-bucket-xXXXXX --force

  3. On the AWS CloudFormation console, select the stack qldb-blog-resources and choose Delete.
  4. On the IAM console, choose Roles in the navigation pane.
  5. Select the roles you created as part of the export job (if you followed the instructions, you can enter qldb in the search box to filter on the roles that were created) and choose Delete.
  6. On the AWS Glue console, choose Crawlers in the navigation pane.
  7. Select the crawler you created (qldb_s3_json_crawler) and on the Action menu, choose Delete crawler.
  8. Choose Databases in the left navigation pane.
  9. Select the database that you created (walletledger_glue_qldb_export_db) and on the Action menu, choose Delete database.

Conclusion

In this post, we showed you how to export critical transaction data from banking or wallet accounts stored in HAQM QLDB to HAQM S3, trigger an ETL workflow with AWS Glue, and query the data in HAQM S3 using Athena. This process makes your HAQM QLDB ledger data available for query and analysis using AWS data lake technologies.

This combination of technologies enables you to keep a complete and verifiable history of all of your transactions within HAQM QLDB, while making that data available to the rest of the business for query and analysis in a serverless, cost-effective manner using AWS Glue and Athena.

Leave your feedback, comments, and suggestions for improvement in the comments section.


About the Author

Anurag Jain is a Global Solutions Architect at AWS based out of Palo Alto, CA. He has 2 decades of wide technology experience in Innovation & Prototyping solutions with core expertise in AWS Cloud platform & Architecting Microservices. He primarily drives Application Modernization journey on AWS Cloud, build Cloud Center of Excellence practice and serve as Advisory Consultant to Office of CTO for World Wide High Tech customers & System Integrator’s.