AWS Database Blog

Transition a pivot query that includes dynamic columns from SQL Server to PostgreSQL

When assisting customers with migrating their workloads from SQL Server to PostgreSQL, we often encounter a scenario where the PIVOT function is used extensively for generating dynamic reports.

An existing solution is documented in the Microsoft SQL Server 2019 to HAQM Aurora PostgreSQL Migration Playbook that involves using CASE WHEN statements for each pivoted column. However, this approach has limitations. It becomes difficult to maintain the code when there are many increasing pivoted columns. Additionally, adding a new column requires changes to the source code.

In this post, we show you how to use the crosstab function, provided by PostgreSQL’s tablefunc extension, to implement functionality similar to SQL Server’s PIVOT function, offering greater flexibility.

Solution overview

For this solution, we use crosstab as the foundation to demonstrate the implementation of SQL Server-like PIVOT functionality through the PostgreSQL function get_dynamic_pivot_data. The primary benefit of using crosstab is that it can dynamically generate columns based on the result of the provided queries, making it flexible for different data sets. You will learn how to manage multiple fixed and variable columns in a pivot table, along with the corresponding cursor in the PostgreSQL function. This approach includes handling a PostgreSQL refcursor. The function can be invoked using either psql or C#. psql, a terminal-based front-end to PostgreSQL, will be used to guide you on how to call it with a cursor parameter. Typically, C# accesses PostgreSQL through Npgsql. In this post, we provide sample C# code demonstrating how to use the PostgreSQL function using Npgsql, including the management of a cursor variable.

The following diagram illustrates the solution architecture.

Prerequisites

Configure the following settings in your AWS account and installations on your laptop, based on how you will test the function.

Test on the PostgreSQL client side

The following steps pertain to testing the function on the PostgreSQL client side:

  1. Provision HAQM Aurora PostgreSQL-Compatible Edition or HAQM Relational Database Service (HAQM RDS) for PostgreSQL.
  2. Install a PostgreSQL client tool, such as pgAdmin on HAQM EC2 for Microsoft Windows Server, or psql on HAQM Linux 2023.
    • AWS CloudShell includes the PostgreSQL client tool (psql) version 15 by default, as shown in the following screenshot. If you prefer not to provision an EC2 instance, you can use CloudShell to access Aurora PostgreSQL directly.

psql Command in CloudShell

Test with C#:

The following steps pertain to testing with C#. If you don’t want to use C#, these installations aren’t required.

  1. Download and install .NET SDK 8 on your EC2 instance. The following is an example of how to check the .NET version on Windows:
    C:\Users\Administrator>dotnet --list-sdks
    8.0.404 [C:\Program Files\dotnet\sdk]
    C:\Users\Administrator>dotnet --version
    8.0.404
  2. Download and install Visual Studio Code on your EC2 instance for Windows.

Use the PIVOT function on the SQL Server side

To transit a pivot query that includes dynamic columns, start by creating two tables and a stored procedure on the SQL Server side. You can find the corresponding code in our Github. Complete the following steps:

  1. Create two tables with some dummy data
    1. The following code creates the table QuarterTbl:
      CREATE TABLE QuarterTbl (
      QuarterID INT NOT NULL IDENTITY PRIMARY KEY,
      QuarterItem varchar(2)
      );
      
      INSERT INTO QuarterTbl([QuarterItem])
      VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4');

      The following output shows sample data from the table QuarterTbl:

      QuarterID    QuarterItem
      ------------ -----------
      1            Q1
      2            Q2
      3            Q3
      4            Q4
      
  2. The following code creates the table ProductSales:
    CREATE TABLE ProductSales (
    ProductID INT NOT NULL IDENTITY PRIMARY KEY,
    ProductName varchar(10),
    QuarterID int,
    Year varchar(5),
    Sales int
    FOREIGN KEY (QuarterID) REFERENCES QuarterTbl(QuarterID)
    );
    
    INSERT INTO ProductSales([ProductName],[QuarterID],[Year],[Sales])
    VALUES
    ('ProductA', 1, 'Y2017', 100),
    ('ProductA', 2, 'Y2018', 150),
    ('ProductA', 2, 'Y2018', 200),
    ('ProductA', 1, 'Y2019', 300),
    ('ProductA', 2, 'Y2020', 500),
    ('ProductA', 3, 'Y2021', 450),
    ('ProductA', 1, 'Y2022', 675),
    ('ProductB', 2, 'Y2017', 0),
    ('ProductB', 1, 'Y2018', 900),
    ('ProductB', 3, 'Y2019', 1120),
    ('ProductB', 4, 'Y2020', 750),
    ('ProductB', 3, 'Y2021', 1500),
    ('ProductB', 2, 'Y2022', 1980)
    ;

    The total sales figures for each quarter across each year are displayed as follows:

    SELECT
    PS.ProductName,
    Q.QuarterItem,
    PS.Year, SUM(PS.Sales) as QuarterSales
    FROM ProductSales AS PS
    INNER JOIN QuarterTbl AS Q
    ON PS.QuarterID = Q.QuarterID
    GROUP BY PS.ProductName, Q.QuarterItem, PS.Year
    ORDER BY 1, 2, 3

    The following is the query result for quarterly sales dummy data between 2017-2022.

    ProductName QuarterItem Year     QuarterSales
    ----------- ----------- -----    ------------
    ProductA          Q1    Y2017     100
    ProductA          Q1    Y2019     300
    ProductA          Q1    Y2022     675
    ProductA          Q2    Y2018     350
    ProductA          Q2    Y2020     500
    ProductA          Q3    Y2021     450
    ProductB          Q1    Y2018     900
    ProductB          Q2    Y2017     0
    ProductB          Q2    Y2022     1980
    ProductB          Q3    Y2019     1120
    ProductB          Q3    Y2021     1500
    ProductB          Q4    Y2020     750
    (12 rows affected)
  3. In addition to the tables, create a stored procedure called GetProductSalesReport using a PIVOT function and a dynamic query:
    CREATE OR ALTER PROCEDURE GetProductSalesReport
    @columns NVARCHAR(MAX)
    AS
    DECLARE  @sql NVARCHAR(MAX);
    
    SET @sql = 'SELECT * FROM
    (
    SELECT PS.ProductName, Q.QuarterItem, PS.Year, PS.Sales
    FROM ProductSales PS
    INNER JOIN QuarterTbl Q
    ON PS.QuarterID = Q.QuarterID
    ) t
    PIVOT (
    SUM(Sales)
    FOR [Year] IN ( ' + @columns + ')
    ) AS PV
    ORDER BY 1, 2;';
    
    -- execute the dynamic SQL
    EXECUTE sp_executesql @sql;
    GO
  4. Run the stored procedure with a parameter specifying the list of dynamic columns, to obtain the pivoted results:
    EXEC GetProductSalesReport N'[Y2017], [Y2018], [Y2019], [Y2020], [Y2021], [Y2022]'
    GO

    The following output shows the results of the pivot query:

    ProductName QuarterItem Y2017   Y2018   Y2019   Y2020   Y2021   Y2022
    ----------- ----------- ------- ------- ------- ------- ------- -------
    ProductA    Q1          100     NULL    300     NULL    NULL    675
    ProductA    Q2          NULL    350     NULL    500     NULL    NULL
    ProductA    Q3          NULL    NULL    NULL    NULL    450     NULL
    ProductB    Q1          NULL    900     NULL    NULL    NULL    NULL
    ProductB    Q2          0       NULL    NULL    NULL    NULL    1980
    ProductB    Q3          NULL    NULL    1120    NULL    1500    NULL
    ProductB    Q4          NULL    NULL    NULL    750     NULL    NULL
    (7 rows affected)

Up to this point, you have seen an example of using the PIVOT function to create a sales report in SQL Server. By specifying a dynamic column list, SQL Server can return a result set in a standard table format. Dynamic columns, such as Y20xx, are generated in the moment. In practical scenarios, we have observed customers generating several hundred dynamic columns by querying a primary table.

Transitioning similar logic to PostgreSQL presents a challenge because of its requirement for predefined declarations. In the following section, we show you a workaround for addressing this issue in PostgreSQL.

Implement pivot-like functionality on the PostgreSQL side:

In PostgreSQL, create sample tables identical to those in SQL Server and populate them with the same data. Then, create a function named get_dynamic_pivot_data to implement functionality similar to GetProductSalesReport on the SQL Server side. You can find the corresponding code in our Github.

  1. Create sample tables in PostgreSQL:
    CREATE TABLE quarter_tbl(
    quarter_id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
    quarter_item VARCHAR(2),
    PRIMARY KEY(quarter_id)
    );
    
    CREATE TABLE product_sales(
    product_id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
    product_name VARCHAR(10),
    quarter_id INTEGER,
    year VARCHAR(5),
    sales INTEGER,
    CONSTRAINT fk_quarter
    FOREIGN KEY(quarter_id)
    REFERENCES quarter_tbl(quarter_id)
    );
  2. Populate the two tables in PostgreSQL with the same data as in SQL Server:
    INSERT INTO quarter_tbl(quarter_item)
    VALUES ('q1'), ('q2'), ('q3'), ('q4');
    
    INSERT INTO product_sales(product_name, quarter_id, year, sales) VALUES
    ('ProductA', 1, 'y2017', 100),
    ('ProductA', 2, 'y2018', 150),
    ('ProductA', 2, 'y2018', 200),
    ('ProductA', 1, 'y2019', 300),
    ('ProductA', 2, 'y2020', 500),
    ('ProductA', 3, 'y2021', 450),
    ('ProductA', 1, 'y2022', 675),
    ('ProductB', 2, 'y2017', 0),
    ('ProductB', 1, 'y2018', 900),
    ('ProductB', 3, 'y2019', 1120),
    ('ProductB', 4, 'y2020', 750),
    ('ProductB', 3, 'y2021', 1500),
    ('ProductB', 2, 'y2022', 1980);
  3. To use the CROSSTAB function in PostgreSQL, you must first install and enable the tablefunc extension in your database. This can be done using the following SQL command:
    CREATE EXTENSION IF NOT EXISTS tablefunc;

    Unlike SQL Server’s PIVOT function, which supports multiple row_name columns, PostgreSQL’s crosstab function only supports a single row_name column, as follows:

    row_name      cat     value
    -------------+-------+--------
    row1          cat1    val1
    row1          cat2    val2
    row1          cat3    val3
    row1          cat4    val4
    row2          cat1    val5
    row2          cat2    val6
    row2          cat3    val7
    row2          cat4    val8

    Therefore, you must consolidate all row_name columns into a single composite column, separated by commas. Then, during the creation of the output table, you must unpack all columns from this composite column, as follows:

    fix_col_list | y2017| y2018| y2019| y2020| y2021| y2022
    -------------+------+------+------+------+------+------
    ProductA, q1 |   100|      |   300|      |      |   675
    ProductA, q2 |      |   350|      |   500|      |
    ProductA, q3 |      |      |      |      |   450|
    ProductB, q1 |      |   900|      |      |      |
    ProductB, q2 |     0|      |      |      |      |  1980
    ProductB, q3 |      |      |  1120|      |  1500|
    ProductB, q4 |      |      |      |   750|      |
  4. Use the function get_dynamic_pivot_data to create a pivot table containing multiple row_name columns (fixed columns). The function has two parameters: one to hold a cursor using a refcursor data type, and another to hold the dynamic column list using a text data type. To facilitate processing, the code creates a temporary table to contain the intermediate results of the crosstab This function returns the cursor containing the pivot-like query result.
    CREATE OR REPLACE FUNCTION get_dynamic_pivot_data(ref_cur refcursor, col_list text)
    
    RETURNS refcursor AS $func$
    
    DECLARE
    
    pivot_query text;
    
    year_query text;
    
    col_query text;
    
    var_col_list text;
    
    exec_query text;
    
    fix_col_tbl text;
    
    BEGIN
    
    fix_col_tbl := 'tmp_fix_col_tbl';
    
    pivot_query :=
    
    'SELECT (ps.product_name || '', '' || q.quarter_item) as fix_col_list, '
    
    || 'ps.year, SUM(ps.sales) ' ||
    
    'FROM product_sales ps ' ||
    
    'INNER JOIN quarter_tbl q ' ||
    
    'ON ps.quarter_id = q.quarter_id ' ||
    
    'WHERE year in (' ||  col_list || ') ' ||
    
    'GROUP BY product_name, quarter_item, year ORDER BY 1,2,3 ';
    
    -- RAISE NOTICE 'pivot_query is : %', pivot_query;
    
    year_query := 'SELECT DISTINCT year FROM product_sales WHERE year in ('
    
    ||  col_list || ') ' || 'ORDER BY 1 ';
    
    -- RAISE NOTICE 'year_query is: %', year_query;
    
    -- SELECT STRING_AGG (distinct year, ' int, ' ORDER BY year) || ' int'
    
    -- FROM product_sales;
    
    col_query := 'SELECT STRING_AGG (distinct year, '' int, '' ORDER BY year)
    
    || '' int'' FROM product_sales WHERE year in (' || col_list || ')';
    
    -- RAISE NOTICE 'col_query is: %', col_query;
    
    EXECUTE col_query into var_col_list;
    
    var_col_list := 'fix_col_list varchar, ' || var_col_list;
    
    -- RAISE NOTICE 'var_col_list is: %', var_col_list;
    
    exec_query := 'DROP TABLE IF EXISTS ' || fix_col_tbl;
    
    EXECUTE exec_query;
    
    exec_query := 'CREATE TEMP TABLE ' || fix_col_tbl
    
    || ' AS SELECT * FROM crosstab($$'
    
    || pivot_query || '$$, $$' || year_query
    
    || '$$) AS (' || var_col_list || ' )';
    
    -- RAISE NOTICE 'exec_query1 is %', exec_query;
    
    EXECUTE exec_query;
    
    col_query := 'SELECT STRING_AGG (distinct year, '', '' ORDER BY year) '
    
    || 'FROM product_sales WHERE year IN (' || col_list || ')';
    
    EXECUTE col_query into col_list;
    
    -- RAISE NOTICE 'col_list is: %', col_list;
    
    exec_query := ' SELECT '
    
    || 'SPLIT_PART(fix_col_list, '', '', 1) AS product_name, '
    
    || 'SPLIT_PART(fix_col_list, '', '', 2) AS quarter_item, '
    
    || col_list || ' FROM ' || fix_col_tbl || ' ORDER BY 1, 2'
    
    ;
    
    -- RAISE NOTICE 'exec_query2 is %', exec_query;
    
    OPEN ref_cur FOR EXECUTE exec_query;
    
    RETURN ref_cur;
    
    END;
    
    $func$ LANGUAGE plpgsql;

The following code shows the equivalent of the previous SQL Server function in PostgreSQL. Two columns are split from the fixed column, and the rest of the columns (y2017 through y2022) are generated dynamically. It’s important to specify the cursor name (pivot_cur) when invoking the function. You need to run the function within a transaction because data is fetched from a cursor.

postgres=> BEGIN;
BEGIN
postgres=*> SELECT get_dynamic_pivot_data('pivot_cur', $$'y2017','y2018','y2019','y2020','y2021','y2022'$$);
NOTICE:  table "tmp_fix_col_tbl" does not exist, skipping
get_dynamic_pivot_data
------------------------
pivot_cur
(1 row)
postgres=*> FETCH ALL IN "pivot_cur";
product_name | quarter_item | y2017 | y2018 | y2019 | y2020 | y2021 | y2022
--------------+--------------+-------+-------+-------+-------+-------+-------
ProductA     | q1           |   100 |       |   300 |       |       |   675
ProductA     | q2           |       |   350 |       |   500 |       |
ProductA     | q3           |       |       |       |       |   450 |
ProductB     | q1           |       |   900 |       |       |       |
ProductB     | q2           |     0 |       |       |       |       |  1980
ProductB     | q3           |       |       |  1120 |       |  1500 |
ProductB     | q4           |       |       |       |   750 |       |
(7 rows)
postgres=*> COMMIT;
COMMIT

Invoke the PostgreSQL pivot-like function from C#:

Complete the following steps to build a sample C# project on Windows. The Npgsql package is required to access PostgreSQL for C#.NET.

  1. Open a Command Prompt terminal and change to your work directory.
  2. Create a new .NET project:
    dotnet new console -n Sample-pivot-qry-4-pg
  3. Move into the newly created directory:
    cd Sample-pivot-qry-4-pg
  4. Add Npgsql to the .NET project:
    dotnet add package Npgsql
  5. Add Json support:
    dotnet add package Newtonsoft.Json
  6. Add AWS SDK:
    dotnet add package AWSSDK.SecretsManager
  7. Add Secrets Manager Caching:
    dotnet add package AWSSDK.SecretsManager.Caching --version 1.0.6
  8. Use Visual Studio Code to open the project folder and edit the cs file.
  9. Run the .NET code in the terminal of Visual Studio Code:
    dotnet run

You can review the Program.cs file in the Github repository for details. NpgsqlCommand is used to call the get_dynamic_pivot_data PostgreSQL function, and NpgsqlDataReader is used retrieve the contents of the refcursor.

From a security perspective, both the Aurora PostgreSQL and Windows EC2 instance (hosting the .NET application) are placed in private subnets, with security groups protecting against external attacks. Additionally, database credentials are stored in AWS Secrets Manager, which enhances internal security by managing sensitive information. The purpose of this blog is to demonstrate how to implement SQL Server-like PIVOT queries on Aurora PostgreSQL or HAQM RDS for PostgreSQL. SSL connection implementation is not covered in this sample code as it’s outside the scope of this demonstration.

The following screenshot shows an example of using the Visual Studio Code editor to edit the C# code and run the dotnet command in its terminal. We can see that it produces the same execution result as running the stored function through psql on the command line.

Sample C# Code in Visual Studio Code

Clean up

Complete the following steps after your testing to prevent unnecessary charges:

Conclusion

In this post, you learned how to use the crosstab function of PostgreSQL with a cursor to achieve similar results to SQL Server’s PIVOT function. Additionally, you learned how to invoke the PostgreSQL function that delivers the pivot functionality using C# code.

Have you used different approaches for implementing PIVOT functionality in PostgreSQL? Share your experiences in the comments section. For more information about migrating from SQL Server to PostgreSQL, see Migrate SQL Server to HAQM Aurora PostgreSQL using best practices and lessons learned from the field.


About the Author

Jian (Ken) Zhang is a Senior Database Migration Specialist at AWS. He works with AWS customers to provide guidance and technical assistance on migrating commercial databases to AWS open-source databases. In his spare time, he enjoys exploring good restaurants and playing Go, a strategic board game.