Using the PostgreSQL extension tds_fdw to validate data migration from SQL Server to HAQM Aurora PostgreSQL

Data validation is an important process during data migrations, helping to verify that the migrated data matches the source data. Although AWS Database Migration Service (AWS DMS) offer robust validation features, it relies on primary keys to perform comparisons. However, not all tables have primary keys, which can present a significant challenge for data validation efforts.

In this post, we present alternatives you can use for data validation when dealing with tables that lack primary keys. We discuss alternative approaches, best practices, and potential solutions to make sure that your data migration process remains thorough and reliable, even in the absence of traditional primary key-based validation methods. Specifically, we demonstrate how to perform data validation after a full load migration from SQL Server to HAQM Aurora PostgreSQL-Compatible Edition using the PostgreSQL tds_fdw extension.

Solution overview

There are several effective strategies to validate data for tables lacking a primary key, such as:

Count comparison – A simple yet effective method is to compare the total row counts between the source and target tables using a SQL query. Although this doesn’t guarantee data accuracy, it makes sure all records have been transferred.
Temporary primary key – You can temporarily add a primary key to the source table before migration. This allows you to use AWS DMS validation features for comparison. You then remove the temporary key after the migration is complete.
Database-specific solutions – For PostgreSQL users, you can use tds_fdw (SQL Server Foreign Data Wrapper) or oracle_fdw (Oracle Foreign Data Wrapper). This involves creating a foreign table that represents the source data, enabling direct queries and comparisons between the migrated table and the foreign table.
Third-party tools – You can use specialized software like DBeaver Enterprise Edition, which offers data comparison features through a user-friendly GUI, even for tables without primary keys.

These methods offer a range of options to provide data integrity during migration, even in the absence of a primary key. The choice of method will depend on your specific requirements, database size, and available resources.

In the following sections, we demonstrate how to perform data validation after a full load migration from SQL Server to Aurora PostgreSQL-Compatible. We use the PostgreSQL tds_fdw extension to compare the data between the HAQM Relational Database Service (HAQM RDS) for SQL Server source and Aurora PostgreSQL-Compatible target. You can apply similar steps when using Oracle as a source. The following diagram illustrates the solution architecture.

This validation approach is effective for both full load and change data capture (CDC) scenarios. However, during CDC, if there are significant latency issues on either the source or target systems, you might observe discrepancies in the results when executing the validation query. These differences can be attributed to the time lag in data propagation between the systems.

Prerequisites

To follow along with this post, make sure you have the following prerequisites in place:

An active AWS account
Aurora PostgreSQL-Compatible version 13.6 or higher, which supports the tds_fdw extension
An RDS for SQL Server source table (you can use an existing table or build a sample table as shown in the following code)

For this post, we use an RDS for SQL Server database and create a sample table called players without a primary key using the following DDL:

CREATE TABLE [dbo].[players](
[sport_team_id] [int] NOT NULL,
[last_name] [varchar](30) NULL,
[first_name] [varchar](30) NULL,
[full_name] [varchar](30) NULL
) ON [PRIMARY]

-- Insert data
INSERT INTO [dbo].[players] ([sport_team_id], [last_name], [first_name], [full_name]) VALUES
(1, 'James', 'LeBron', 'LeBron James'),
(1, 'Davis', 'Anthony', 'Anthony Davis'),
(1, 'Reaves', 'Austin', 'Austin Reaves'),
(2, 'Curry', 'Stephen', 'Stephen Curry'),
(2, 'Thompson', 'Klay', 'Klay Thompson'),
(2, 'Green', 'Draymond', 'Draymond Green'),
(3, 'Ronaldo', 'Cristiano', 'Cristiano Ronaldo'),
(3, 'Benzema', 'Karim', 'Karim Benzema'),
(3, 'Modric', 'Luka', 'Luka Modric'),
(4, 'Messi', 'Lionel', 'Lionel Messi'),
(4, 'Suarez', 'Luis', 'Luis Suarez'),
(4, 'Neymar', 'Jr', 'Neymar Jr'),
(5, 'Brady', 'Tom', 'Tom Brady'),
(5, 'Gronkowski', 'Rob', 'Rob Gronkowsk'),
(5, 'Evans', 'Mike', 'Mike Evans'),
(62, ' DeShawn', 'Shead', ' Shead, DeShawn'),
(62, 'Kelce', 'Travis', 'Travis Kelce'),
(7, 'Judge', 'Aaron', 'Aaron Judge'),
(7, 'Cole', 'Gerrit', 'Gerrit Cole'),
(8, 'Ohtani', 'Shohei', 'Shohei Ohtani');

-- Verify the data
SELECT * FROM [dbo].[players];

Perform a full load migration

Now you’re ready to perform a full load data migration from SQL Server (source) to Aurora PostgreSQL-Compatible (target) using AWS DMS. For this post, we use the example table dbo.players.

Set up your Aurora PostgreSQL database to use the tds_fdw extension

Complete the following steps to set up and use the tds_fdw extension with an Aurora PostgreSQL database. In this post, we use the existing schema like public to store the foreign tables referencing directly to the SQL Server database. Alternatively, you can create a dedicated schema for foreign tables, keeping them separate from other objects in existing schemas.

Connect to your Aurora PostgreSQL DB cluster’s primary instance using an account that has the rds_superuser role:
```
psql --host=your-cluster-name-instance-1.aws-region.rds.amazonaws.com --port=5432 --username=test –-password
```
Install the tds_fdw extension using the following command:
```
CREATE EXTENSION tds_fdw;
```

Create a foreign server in the Aurora PostgreSQL DB cluster:

CREATE SERVER sqlserverdb FOREIGN DATA WRAPPER tds_fdw OPTIONS (servername '<your_sqlserver_server_name>', port '1433', database '<your_database_name>');

Verify that the foreign server has been created:

select srvname as name, srvowner::regrole as owner, fdwname as wrapper, srvoptions as options from pg_foreign_server join pg_foreign_data_wrapper w on w.oid = srvfdw;

Grant USAGE permissions if you are using a user who doesn’t have rds_superuser role privileges, for example, dmaadmin. Alternatively, you can also create a new user and grant required permissions.
```
GRANT USAGE ON FOREIGN SERVER sqlserverdb TO dmaadmin;
```

Connect as dmaadmin and create a mapping to a SQL Server preexisting user (for this example, dmaadmin):

CREATE USER MAPPING FOR dmaadmin SERVER sqlserverdb OPTIONS (username ' dmaadmin ', password '<password>');

Create a foreign table linked to a SQL Server table matching the table structure of the target:

CREATE FOREIGN TABLE mssql_players (sport_team_id int, last_name varchar(30), first_name varchar(30), full_name varchar(30)) SERVER sqlserverdb OPTIONS (table 'players');

Perform a data comparison using native queries

Now we have established two tables for comparison: the AWS DMS migrated table, dbo.players, residing in our target database, and a foreign table, mssql_players, in SQL Server, accessed using the tds_fdw extension. This configuration enables us to conduct data validation between the source and target databases through queries. Connect to your Aurora PostgreSQL cluster and run the following query to compare the records row by row in the table to detect any discrepancies in the data:

select * from mssql_players
except all
select * from dbo.players

If the data matches, there will be no rows in the result set. Performing this operation on a large table can be resource-intensive and might lead to performance challenges due to high memory consumption. To mitigate this issue, you can limit the scope of the query by applying a WHERE clause, allowing you to focus on a specific subset of tables or data.Let’s explore two scenarios where a mismatch of data between source and target can occur.

First, we identify discrepancies at the column level. We deliberately altered the last_name column in the source data to TEST to create a mismatch for demonstration purposes. Run the following query on the foreign table in PostgreSQL to see the changes:

select * from mssql_players where sport_team_id=62 and last_name=`TEST`

Run the except all query again to identify discrepancies between the two datasets.

Next, we identify discrepancies at the row level. The following example illustrates how the query, designed to compare rows between the source and target, highlights discrepancies when certain rows are missing:

delete from dbo.players where sport_team_id = 62
commit;
delete from dbo.players where sport_team_id = 1 and last_name = 'Andy Marte'

Now run the except all query to see the differences.

A query using the EXCEPT clause will identify all discrepancies between the source and target, including records that have been intentionally removed or changed from the target.

Clean up

The resources created as part of this post incur costs as long as they are running. To clean up your resources, complete the following steps:

Delete the Aurora PostgreSQL DB cluster.
Delete the RDS for SQL Server instance.
Clean up AWS DMS resources if a full load migration was performed for this post.
Delete the foreign table setup from your existing cluster:
Drop the foreign table:
```
DROP FOREIGN TABLE mssql_players;
```

Drop the user mapping:

DROP USER MAPPING FOR dmaadmin SERVER sqlserverdb;

Drop the foreign server:
```
DROP SERVER sqlserverdb;
```
Drop the tds_fdw extension:
```
DROP EXTENSION tds_fdw;
```

Conclusion

In this post, we showed how to use the PostgreSQL tds_fdw extension as an alternative strategy to validate data migrations from SQL Server to Aurora PostgreSQL-Compatible by creating foreign tables and running comparison queries to validate data integrity for the tables that don’t have primary keys.

If you have any questions or suggestions about these migration techniques, leave your feedback in the comments.

About the authors

Vanshika Nigam is a Solutions Architect on the Database Migration Accelerator team at AWS. With over 6 years of experience working with HAQM RDS and AWS DMS, she serves as an HAQM DMA Advisor helping AWS customers accelerate the migrations of their on-premises, commercial databases to AWS Cloud database solutions.

Amit Arora is a Solutions Architect with a focus on database and analytics at AWS. He works with AWS customers and certified partners to provide technical assistance and design customer solutions on cloud migration projects, helping customers migrate and modernize their existing databases to the AWS Cloud.

AWS Database Blog