AWS Database Blog
Using the PostgreSQL extension tds_fdw to validate data migration from SQL Server to HAQM Aurora PostgreSQL
Data validation is an important process during data migrations, helping to verify that the migrated data matches the source data. Although AWS Database Migration Service (AWS DMS) offer robust validation features, it relies on primary keys to perform comparisons. However, not all tables have primary keys, which can present a significant challenge for data validation efforts.
In this post, we present alternatives you can use for data validation when dealing with tables that lack primary keys. We discuss alternative approaches, best practices, and potential solutions to make sure that your data migration process remains thorough and reliable, even in the absence of traditional primary key-based validation methods. Specifically, we demonstrate how to perform data validation after a full load migration from SQL Server to HAQM Aurora PostgreSQL-Compatible Edition using the PostgreSQL tds_fdw extension.
Solution overview
There are several effective strategies to validate data for tables lacking a primary key, such as:
- Count comparison – A simple yet effective method is to compare the total row counts between the source and target tables using a SQL query. Although this doesn’t guarantee data accuracy, it makes sure all records have been transferred.
- Temporary primary key – You can temporarily add a primary key to the source table before migration. This allows you to use AWS DMS validation features for comparison. You then remove the temporary key after the migration is complete.
- Database-specific solutions – For PostgreSQL users, you can use tds_fdw (SQL Server Foreign Data Wrapper) or oracle_fdw (Oracle Foreign Data Wrapper). This involves creating a foreign table that represents the source data, enabling direct queries and comparisons between the migrated table and the foreign table.
- Third-party tools – You can use specialized software like DBeaver Enterprise Edition, which offers data comparison features through a user-friendly GUI, even for tables without primary keys.
These methods offer a range of options to provide data integrity during migration, even in the absence of a primary key. The choice of method will depend on your specific requirements, database size, and available resources.
In the following sections, we demonstrate how to perform data validation after a full load migration from SQL Server to Aurora PostgreSQL-Compatible. We use the PostgreSQL tds_fdw
extension to compare the data between the HAQM Relational Database Service (HAQM RDS) for SQL Server source and Aurora PostgreSQL-Compatible target. You can apply similar steps when using Oracle as a source. The following diagram illustrates the solution architecture.
This validation approach is effective for both full load and change data capture (CDC) scenarios. However, during CDC, if there are significant latency issues on either the source or target systems, you might observe discrepancies in the results when executing the validation query. These differences can be attributed to the time lag in data propagation between the systems.
Prerequisites
To follow along with this post, make sure you have the following prerequisites in place:
- An active AWS account
- Aurora PostgreSQL-Compatible version 13.6 or higher, which supports the tds_fdw extension
- An RDS for SQL Server source table (you can use an existing table or build a sample table as shown in the following code)
For this post, we use an RDS for SQL Server database and create a sample table called players without a primary key using the following DDL:
Perform a full load migration
Now you’re ready to perform a full load data migration from SQL Server (source) to Aurora PostgreSQL-Compatible (target) using AWS DMS. For this post, we use the example table dbo.players
.
Set up your Aurora PostgreSQL database to use the tds_fdw extension
Complete the following steps to set up and use the tds_fdw
extension with an Aurora PostgreSQL database. In this post, we use the existing schema like public
to store the foreign tables referencing directly to the SQL Server database. Alternatively, you can create a dedicated schema for foreign tables, keeping them separate from other objects in existing schemas.
- Connect to your Aurora PostgreSQL DB cluster’s primary instance using an account that has the
rds_superuser
role: - Install the
tds_fdw
extension using the following command: - Create a foreign server in the Aurora PostgreSQL DB cluster:
- Verify that the foreign server has been created:
- Grant
USAGE
permissions if you are using a user who doesn’t have rds_superuser role privileges, for example,dmaadmin
. Alternatively, you can also create a new user and grant required permissions. - Connect as
dmaadmin
and create a mapping to a SQL Server preexisting user (for this example,dmaadmin
): - Create a foreign table linked to a SQL Server table matching the table structure of the target:
Perform a data comparison using native queries
Now we have established two tables for comparison: the AWS DMS migrated table, dbo.players
, residing in our target database, and a foreign table, mssql_players
, in SQL Server, accessed using the tds_fdw
extension. This configuration enables us to conduct data validation between the source and target databases through queries. Connect to your Aurora PostgreSQL cluster and run the following query to compare the records row by row in the table to detect any discrepancies in the data:
If the data matches, there will be no rows in the result set. Performing this operation on a large table can be resource-intensive and might lead to performance challenges due to high memory consumption. To mitigate this issue, you can limit the scope of the query by applying a WHERE clause, allowing you to focus on a specific subset of tables or data.Let’s explore two scenarios where a mismatch of data between source and target can occur.
First, we identify discrepancies at the column level. We deliberately altered the last_name
column in the source data to TEST
to create a mismatch for demonstration purposes. Run the following query on the foreign table in PostgreSQL to see the changes:
select * from mssql_players where sport_team_id=62 and last_name=`TEST`
Run the except all
query again to identify discrepancies between the two datasets.
Next, we identify discrepancies at the row level. The following example illustrates how the query, designed to compare rows between the source and target, highlights discrepancies when certain rows are missing:
Now run the except all
query to see the differences.
A query using the EXCEPT clause will identify all discrepancies between the source and target, including records that have been intentionally removed or changed from the target.
Clean up
The resources created as part of this post incur costs as long as they are running. To clean up your resources, complete the following steps:
- Delete the Aurora PostgreSQL DB cluster.
- Delete the RDS for SQL Server instance.
- Clean up AWS DMS resources if a full load migration was performed for this post.
- Delete the foreign table setup from your existing cluster:
- Drop the foreign table:
- Drop the user mapping:
- Drop the foreign server:
- Drop the tds_fdw extension:
Conclusion
In this post, we showed how to use the PostgreSQL tds_fdw
extension as an alternative strategy to validate data migrations from SQL Server to Aurora PostgreSQL-Compatible by creating foreign tables and running comparison queries to validate data integrity for the tables that don’t have primary keys.
If you have any questions or suggestions about these migration techniques, leave your feedback in the comments.
About the authors
Vanshika Nigam is a Solutions Architect on the Database Migration Accelerator team at AWS. With over 6 years of experience working with HAQM RDS and AWS DMS, she serves as an HAQM DMA Advisor helping AWS customers accelerate the migrations of their on-premises, commercial databases to AWS Cloud database solutions.
Amit Arora is a Solutions Architect with a focus on database and analytics at AWS. He works with AWS customers and certified partners to provide technical assistance and design customer solutions on cloud migration projects, helping customers migrate and modernize their existing databases to the AWS Cloud.