How HAQM Finance Automation built an operational data store with AWS purpose built databases to power critical finance applications

HAQM Finance Automation (FinAuto) is the tech organization of HAQM Finance Operations (FinOps). Its mission is to enable FinOps to support the growth and expansion of HAQM businesses. It works as a force multiplier through automation and self-service, while providing accurate and on-time payments and collections. FinAuto has a unique position to look across FinOps and provide solutions that help satisfy multiple use cases with accurate, consistent, and governed delivery of data and related services.

In this post, we discuss how the HAQM Finance Automation team used AWS purpose built databases, such as HAQM DynamoDB, HAQM OpenSearch Service, and HAQM Neptune together coupled with serverless compute like AWS Lambda to build an Operational Data Store (ODS) to store financial transactional data and support FinOps applications with millisecond latency. This data is the key enabler for FinOps business.

Requirements

HAQM’s financial data is generated in different systems and is stored in their respective sub-ledgers for accounting purposes, each with their own data model. A common model is needed to represent data across sub-ledgers and to provide a unified and seamless experience to the financial analysts working on this data. This integrated data should be accessible via APIs to support both backend and frontend workbench applications.

To decide which database service to use, we focused on the following key requirements:

The database service should be able to scale quickly to handle a large volume of transactions. Our upstream systems generate hundreds of millions of financial transactional events in a day, so the service should be able to scale accordingly.
The APIs that vend out this data should have millisecond level latency to support frontend user applications. Additionally, the APIs should allow key-value based lookups. For example, a financial analyst would need to get each transaction associated with a given account number.
The database should allow a flexible schema to support different sub-ledgers as they would have their own set of data attributes.
The data store should allow searching across financial entities such as accounts, invoices, credit notes, and receipts. For example, a financial analyst would want to find an account based on its name without knowing the account number.
The data store should support data aggregation across various dimensions, such as by account or by business channel that would include thousands of accounts. For example, a financial analyst would need to know the total receivables associated with a given account.
The database should be able to store account relations. This was a unique requirement specific to FinOps, as the same customer could have more than one HAQM account they use for an HAQM service. In order to provide a seamless experience to the customers in the form of consolidated billing statements, our database service should be able to store account relations. These relations would then be used to group transactions across related accounts.

Solution overview

For our diverse set of requirements, we identified need to leverage multiple AWS purpose built databases tailored to address the specific requirements.

We selected DynamoDB as our primary datastore as it provides low latency access to the underlying record(s) via a key-value read request, for example, getting invoices or receipts for a given account, getting demographic details like name, status of an account. Additionally, HAQM DynamoDB supports a flexible schema so each item can have different attributes. This allowed us to store custom attributes from the various receivable systems. We used DynamoDB on-demand capacity mode since our workload is unpredictable, given that our clients that read data are spread across different regions and our write workload is dependent on the upstream billing system. The on-demand capacity mode takes care of managing capacity for us and we pay based on our consumption.

For data aggregation needs, where summarized data is needed across thousands of transactions without extra latency or degradation of the experience, we used a combination of HAQM DynamoDB Streams, AWS Lambda and HAQM Simple Queue Service (HAQM SQS) to build a precompute service that is triggered with incoming events and aggregates asynchronously. By precomputing, users can get access to data with millisecond latency when needed.

Users needed the ability to search data across financial entities by multiple attributes. They also needed a search-as-you-type functionality for account names and ability to list the top accounts with the highest aging receivables balance. This would allow the analysts to plan activities for those accounts. We leveraged HAQM OpenSearch Service for this. Once we had the aging balance calculated through the precompute service, we would persist the data into HAQM OpenSearch Service. This enabled the ability to aggregate and sort the data across thousands of accounts.

Some customers have more than one account within a sub-ledger. They would, however, like to receive unified communication and billing statements to avoid being contacted separately for each individual account. We needed an ability to store and query account relationships defined by financial analysts. Account relationships can generally be visualized as a tree structure that is account ‘B’ and ‘C’ are child accounts of ‘A’. HAQM Neptune is a fast, reliable, fully managed graph database, optimized for storing and querying highly connected data. HAQM Neptune creates a graph representation which mimics the real-world structure of relationships that we are trying to create. We used HAQM Neptune as it is purpose built for storing and traversing connected data.

Solution architecture

The following diagram shows the high level solution architecture with the different AWS services.

Note: You can click on the images to open a larger version of them in a new tab.

Data from multiple upstream systems is ingested through different mechanisms, transformed and stored in DynamoDB. DynamoDB Streams are enabled on the tables, and a set of AWS Lambda functions process the stream messages into HAQM SQS queues. The queues have dead-letter queues (DLQs) configured for appropriate error handling. Another set of Lambda functions read messages from these SQS queues and index data into HAQM OpenSearch Service domains.

Note – this solution was in production before the introduction of the DynamoDB zero-ETL with OpenSearch Service. Adopting zero ETL would have simplified our architecture and operations, and, if applicable, we plan to move to this integration in the future.

A separate microservice that identifies and processes account relations stores the relationships in an HAQM Neptune database. Data is vended out through DynamoDB, Opensearch Service or Neptune clusters through APIs based on the business use cases.

HAQM DynamoDB

We created an identifier for different entities to avoid the possibility of a duplicate ID, since the upstream systems are not connected. The new ID is unique within our system and is the one that data consumers use when making API calls. Our system uses an UUID version 5, where we used a combination of attributes available per transaction to create each ID.

The following diagram shows our DynamoDB base table data model.

The sort key attribute allows us to use the KeyConditionExpression where you can specify queries that retrieve each value that BeginsWith an account or invoices for a given (transaction) Id.

For some of our API requirements, we used Global Secondary Indexes (GSIs) as we need to query the data using a different attribute than the (transaction) Id. GSIs are sparse by default, meaning if the defined partition and sort keys for the GSI do not exist on the item, the item will not be written to the GSI. Sparse indexes are useful if we want to query only a subsection of the data from the table. In the example above, the attribute marketplace is only available for the account items. Creating a GSI with ‘amzChannel’ as the primary key and marketplace as the sort key allows us to quickly query every accounts belonging to a particular amzChannel and marketplace without querying transactional invoice items.

The following diagram illustrates a sample GSI created on our DynamoDB table :

We also use Local Secondary Indexes (LSIs) for some of our sorting use cases. Usually, an account could have several hundred closed invoices (invoices created and paid for over a period of time), and a small number of open invoices. To calculate the aging balance for a given account, we need to look at the open invoices within the account. If we need to look up only open invoices for a given account, we could look up the invoices for the account and then use DynamoDB FilterExpression to filter invoices with status=OP. However, filter expressions are applied after a query finishes but before the results are returned. This means it would use the same amount of read capacity. For an account with thousands of invoices, the latency for such a query would be high. LSIs gave our application a choice of sort keys. LSIs also contain a copy of some or most of the attributes from its base table. The data in an LSI is organized by the same partition key as the base table but with a different sort key. By creating a LSI with invoiceStatus as the sort key, we could quickly query open invoices for a given account. This improved our query latency for open invoices by 10X for accounts with a large number of closed invoices.

As LSIs can only be created at the time of table creation, we created placeholder attributes like sortKey1 and sortKeyN so that we could leverage it for future sorting use cases.

See a sample representation of attributes within the base DynamoDB table and its LSI:

One of the primary requirements for ODS ingesting and vending financial data is to guarantee accurate data quality. Since our service is used to build Statement of Accounts (SOAs) that get sent out to the external customers, having a robust and reliable reconciliation service is critical to guarantee data accuracy. Our reconciliation solution needs the source of truth data (made available by our upstream systems) and the transformed data from our DynamoDB tables made available in HAQM Simple Storage Service (HAQM S3). We used native DynamoDB exports to get the data into S3. By doing this, we avoided full table scans on DynamoDB. DynamoDB also supports incremental exports, so we configured exports only for data changed since the last export run. Additionally, it doesn’t affect the read capacity or the availability of our tables. We then use HAQM EMR to run a Spark application that reconciles the data exported to S3 by DynamoDB, with an export from our upstream data stores (source of truth). In addition to being an input into our reconciliation system, the exported DynamoDB data is also used by other downstream data teams to build financial reporting using our data.

The following diagram represents the high level architecture for our reconciliation service :

HAQM OpenSearch Service

We use HAQM OpenSearch Service for search-as-you-type and aggregation requirements. The transactional data in DynamoDB is further enriched and ingested into HAQM OpenSearch Service. DynamoDB natively integrates with AWS Lambda, which can be used to create triggers to respond to events in the DynamoDB Streams. The Lambda function processes the stream events and sends the message to entity specific SQS queues. A Lambda function processes the messages in SQS, enriches the data with additional information and indexes the data into OpenSearch. One of the requirements is to identify aging account receivables balance for all the accounts assigned to a financial analyst. This enables the analysts to prioritize the accounts assigned to them in order. Analyst account assignment is a separate dataset and is not stored with the transactional data. By querying this data and ingesting the enriched data into OpenSearch Service, we are able to run aggregation queries on OpenSearch as per the client’s use case with millisecond latency.

For search-as-you-type use cases, OpenSearch Service provides query-time compared to index-time processing options. With query time processing, at the time of query we specify the filters which allow us to query the prefix of stored fields. Data is stored as usual in OpenSearch but the query looks for the prefix. This is a heavy operation because OpenSearch computes the prefix of already indexed data on-the-fly and then performs the search. It is more like “if string contains” kind of work. Index time processing allows for custom tokenizers to divide the word into smaller pieces at index time. We used edge_ngram tokenizer while indexing data. The following screenshots show an example of how an account name “HAQM” is split into multiple tokens at index time. This allowed for faster results for search-as-you-type use cases.

HAQM Neptune

As noted above, we have a requirement to track customer account relations. One customer could have multiple accounts within a sub-ledger. Customers request a single consolidated billing statement across all their accounts. Additionally, financial analysts need the ability to view consolidated aging balance across all accounts of a customer. This creates the need to identify and persist account relations. Each business defines their own rules and attributes to identify related accounts for a customer. Our system ingests these data attributes based on the defined rules and also detects changes to these attributes to be able to identify and maintain related accounts. HAQM Neptune, being a graph database, allows us to model relationships in a natural way and then be able to run flexible traversal queries to gather use case specific data.

Alternatively, we looked into how we could use DynamoDB to achieve the same. Relationships could be stored in DynamoDB using an adjacency list model where our nodes contain references to its parent node and/or edges that are associated with that node. While this approach is feasible, queries to fetch data require repeated calls to traverse relationships which result in degraded performance for nested relationships and relationships that fan out. With Neptune, the traversal happens in the database and we are able to fetch relationships with a single request. For this reason we went with HAQM Neptune for storing relationships.

For our initial use case, we needed the ability to group multiple individual accounts as a collective group to work upon. The following diagram represents the node/edge structure. We can get the accounts that are related quickly by traversing edge of type HasParent from a node. The query can start at the Group level and follow incoming edges or it can start at an account level, follow outgoing edges, and reverse back to incoming edges from found Groups.

// If starting from an account to find all related accounts, example account 1
g.V(1)
 // Follow inbound/outbound edges of type "HasParent" deduping to avoid repetition
 .repeat(both("HasParent").dedupe())
 // Fetch nodes that are of type "Account"
 .emit(hasLabel("Account"))
 
> [
    { id: 1, label: "Account" }, 
    { id: 2, label: "Account" }, 
    { id: 3, label: "Account" }
]

For more complex use cases, we extend our schema to provide a new macro grouping. New Node/Edge types can be added to existing data that defines new relationship types and metadata. From this, we can make minor modifications to our existing query in order to tell Neptune to traverse this new set of relationships to find accounts that are related within the system. This pattern can be repeated to provide different views of groupings of the same data. Additionally, results can be filtered by properties set on the Nodes to further refine the items that get returned.

The following diagram shows a complex grouping where two business groups are related to same organization.

// If starting from an account to find all related accounts, example account 1
g.V(1)
// Follow inbound/outbound edges of type "HasParent" and "SubsidiaryOf", deduping to avoid repitition
.repeat(both("HasParent", "SubsidiaryOf").dedupe())
// Fetch nodes that have are of type "Account"
.emit(hasLabel("Account"))

> [
{ id: 1, label: "Account" },
{ id: 2, label: "Account" },
{ id: 3, label: "Account" },
{ id: 4, label: "Account" }
]

There could be a case wherein the account moves from one customer to another, thereby triggering the need to ungroup the corresponding account from old customer and move to the new customer.

The following diagram shows an example where Account 1 which belonged to Business Group 1 of organization A is moved to Business Group 3 which is a new grouping.

In conclusion, HAQM Neptune allowed us to build an account relationship service without the complexities of managing our own graph database management system. The model allowed simple traversal of account relations and solved our usecase to identify related accounts, thereby providing a better experience to our users. The graph is updated daily to identify changes and we are able to retrieve relationships with p90 < 250 ms.

Operational excellence

While dealing with HAQM financial data, we need to guarantee 100% data quality. With a distributed system involving multiple components, it is important to maintain reliability and accuracy. We have designed our system with fault tolerance as a tenet. For example, each of our SQS queues is configured with a corresponding DLQ to capture message we fail to process and have HAQM CloudWatch monitors and alarms configured for them. Additionally, we have leveraged CloudWatch API Gateway (APIG) metrics for monitoring 5XX errors and latency metrics across our APIs. We use the AWS SDK for CloudWatch to publish custom metrics to CloudWatch for our business use cases. For example, we publish the latency metrics for the ingestion events to calculate the time taken from transaction event generation in the source system to the time it eventually is ingested into our operational data store.

Summary

In this post, we showed you how the Finance Automation team utilized the purpose built AWS databases DynamoDB, OpenSearch Service and Neptune to build a scalable, reliable, event-driven operational data storage that satisfies unique finance requirements. With DynamoDB out-of-the-box features such as DynamoDB Streams, GSIs, LSIs, exports to S3 and schema flexibility, our team was able to reduce the overall development effort. With HAQM OpenSearch service, our team was able to power search and aggregation requirements while maintaining API latency for their workbench clients. HAQM Neptune simplified storage, traversal and retrieval of account relations. We also simplified the solution by using serverless compute like AWS Lambda that natively integrates with DynamoDB Streams and uses HAQM SQS. SQS allowed us to decouple these datastores while maintaining a high fault tolerance.

To learn more on AWS Purpose-built databases and to get started visit Purpose-built databases.

About the Authors

Nitin Arora is a Sr. Software Development Manager for Finance Automation in HAQM. He has over 18 years of experience building business critical, scalable, high-performance software. Nitin leads several data and analytics initiatives within Finance, which includes building Data Mesh. In his spare time, he enjoys listening to music and read.

Pradeep Misra is a Principal Specialist Solutions Architect at AWS. He works across HAQM to architect and design modern distributed analytics and AI/ML platform solutions. He is passionate about solving customer challenges using data, analytics, and AI/ML. Outside of work, Pradeep likes exploring new places, trying new cuisines with family, and playing board & outdoor games with his family. He also likes doing science experiments with his daughters.

Kumar Satyen Gaurav, is an experienced Software Development Manager at HAQM, with over 18+ years of expertise in enterprise applications, big data solutions, and software development. He leads a team of engineers to build products and services using AWS technologies, for providing key business insights for HAQM Finance Operations across diverse business verticals. Outside of his technical leadership, he finds joy in reading, travelling and pursuit of knowledge in financial investments and economic affairs.

Tarun Gupta is a Sr. System Development Engineer at HAQM. He has architected and implemented critical big data processing systems and operational data stores that power HAQM’s financial data infrastructure. His work spans multiple domains, including S3-based table formats for Spark like Apache Hudi, data modelling for operational data stores and supporting complex application access patterns through a diverse set of NoSQL and relational AWS datastores. Beyond his professional pursuits, Tarun is an avid board gamer and enjoys hands-on DIY projects, with a keen interest in carpentry and painting.

Javier Vega is a Sr. Software Development Engineer at HAQM with over 9 years of experience. Using AWS, Javier has led the design and implementation of multiple large-scale, reliable, event-driven data services that help power HAQM Finance operations. Outside of work, Javier develops projects that analyze and visualize online gaming statistics to help players improve their gameplay.

AWS Database Blog