Building Multi-AZ or Multi-Region HAQM Redshift Clusters

This blog post was last reviewed July, 2022.

This post explores customer options for building multi-region or multi-availability zone (AZ) clusters. By default, HAQM Redshift has excellent tools to back up your cluster via snapshot to HAQM Simple Storage Service (HAQM S3). These snapshots can be restored in any AZ in that region or transferred automatically to other regions for disaster recovery. HAQM Redshift can even prioritize data being restored from HAQM S3 based on the queries running against a cluster that is still being restored.

However, sometimes requirements demand a faster Recovery Point Object/Recovery Time Object (RPO/RTO) of a full-scale operational HAQM Redshift cluster. Additionally, HAQM Redshift parallelizes queries across the different nodes of a cluster, but there may be circumstances when you want to allow more concurrent queries than one cluster can provide. You can achieve this by deploying two (or more) identical, independent parallel HAQM Redshift clusters. This design requires all database updates to be performed on all HAQM Redshift clusters.

There are many ways to ensure all writes occur to all clusters, but we will look at one possible way: using HAQM Kinesis. HAQM Kinesis is a durable ordered ingestion service where you can put messages and have one or more Kinesis-enabled applications retrieve and process those messages. In this case, all database modifications and updates would be sent to HAQM Kinesis and there would be an HAQM Kinesis-enabled subscriber for each identical cluster. The data flow diagram would look like this:

The source of each data load and modification query puts all update queries into HAQM Kinesis. In the image above there are three parallel clusters in two different regions, but you can choose any region/AZ combination. There will be one HAQM Kinesis subscriber for each parallel HAQM Redshift cluster. The subscriber will be in the same availability zone as the Redshift cluster it serves and will be in an AutoScaling group with a minimum and a maximum of one. That way, any failure of that subscriber is replaced to allow update processing to continue on where it left off.

As an update SQL command is written to Kinesis, each subscriber reads it and applies it to the HAQM Redshift cluster it serves. Ensure that updates can get to an HAQM Redshift cluster only through this process so that they all stay in sync. From an analysis standpoint, we will use HAQM Route 53, a fully managed DNS service, to do health checks and latency-based routing to determine which cluster will be accessed by users and tools to perform analysis and run read-only queries on the system.

Configuring the HAQM Redshift Cluster

Let’s walk through the configuration. For demonstration purposes, we will use just two HAQM Redshift clusters in the same region, but you could modify these steps to add more regions and parallel clusters.

Assumptions:

To keep this demo simple, the AWS CloudFormation template will launch in a Default VPC or EC2-Classic. Best practice would be to have each HAQM Redshift DB and HAQM Kinesis reader in a private subnet.
We are doing multi-AZ; to do multi-region you’d need to modify the HAQM Kinesis endpoint used by the HAQM Kinesis-enabled subscribers to fit the specific setup required for your database.
In this example, we keep things as simple as possible. There are many optimizations that could be made based off your company’s requirements and database environment.

IMPORTANT:

If you are updating a parallel or single HAQM Redshift cluster with a frequency greater than one minute, read the AWS Big Data Blog post that explains best practices for micro-batch loading on HAQM Redshift.

Be sure you are familiar with all of the HAQM Redshift best practices for loading data.

Directions:

First, create two identical HAQM Redshift clusters in different AZs with the same database name, administrative username, and password. You can start with a small, empty database to try out the process. However, if you were taking an existing cluster and enhancing it to span multiple AZ’s or regions, you would just read-only the database and restore a snapshot to another database to have a starting point for your two identical parallel clusters.

Get the HAQM Redshift end-point, database name and Availability Zone of each cluster, which you’ll need as an input in future steps. To get this, go to the AWS console > Redshift > Clusters, and select both clusters. The dashboard has the information:

Launch this AWS CloudFormation template. It creates the following:

HAQM Kinesis Stream that ingests all modification queries.
HAQM DynamoDB table to track where each query process in case a processor fails and must be recreated.
IAM role that can read from the HAQM Kinesis stream and read and write to the DynamoDB tables created above.
Two HAQM Kinesis subscribers instances in AutoScaling groups to read from Kinesis and apply update queries to the Redshift clusters. It also deploys them. The HAQM Kinesis subscriber application code can be found here. These instances use the IAM role to gain access to read from the HAQM Kinesis stream. It uses the HAQM Redshift username, password and endpoints to connect to the clusters to apply the updates as they come in via HAQM Kinesis. This subscriber code is just an example and could be easily modified to accommodate a more complex database environment.
Security group around the subscriber instances to allow only SSH port 22 access for administration from an IP address you provide as a parameter.

Now that you have a system for keeping the HAQM Redshift clusters in sync, it’s time to modify the database. We start by creating a table. Then we’ll load data into that table from a public data set in HAQM S3. HAQM Redshift can load data directly from HAQM S3 in parallel, so a best practice for data loading into HAQM Redshift is to write to HAQM S3 first. With this configuration we are only putting load SQL statements and updates through HAQM Kinesis, not actual data.

To write into the HAQM Kinesis stream, we use the HAQM Command Line Interface (CLI) . With CLI you can do this by running the following command to create a customer table with eight columns:

aws kinesis put-record –stream-name <YOUR KINESIS STREAM NAME> –data “$(echo “CREATE TABLE customer (c_custkey int8 NOT NULL, c_name varchar(25) NOT NULL, c_address varchar(40) NOT NULL, c_nationkey int4 NOT NULL, c_phone char(15) NOT NULL, c_acctbal numeric(12,2) NOT NULL, c_mktsegment char(10) NOT NULL, c_comment varchar(117) NOT NULL);”|base64 –wrap=0)” –partition-key “Source1”

You must modify this command and add the HAQM Kinesis stream name from the AWS CloudFormation template. You can do this by checking output from the AWS CloudFormation template in the console, selecting the stack that we just created, and then going to the Outputs tab.

Next, we load data into the HAQM Redshift clusters using the COPY command, which pulls the data from HAQM S3 into the database. For this demo purpose, our cluster has a IAM_Role attached to it which has access to the S3 bucket. For more information on allowing Redshift cluster to access other AWS services please go through this documentation.

COPY statement:

aws kinesis put-record –stream-name <YOUR-KINESIS-STREAM-NAME> –data “$(echo “copy customer from ‘s3://redshift-demo/tpc-h/100lzo/customer/customer.tbl.’ IAM_Role ‘<RedshiftClusterRoleArn>’ lzop delimiter ‘|’ COMPUPDATE ON;”|base64 –wrap=0)” –partition-key “Source1”

/*If you are using default IAM role with your cluster, you can replace the ARN with default as below
COPY statement:
aws kinesis put-record –stream-name –data “$(echo “copy customer from ‘s3://redshift-demo/tpc-h/100lzo/customer/customer.tbl.’ IAM_Role default lzop delimiter ‘|’ COMPUPDATE ON;”|base64 –wrap=0)” –partition-key “Source1”
*/

For the purposes of demonstration and to keep this simple, we pass the keys all the way from the source to HAQM Redshift. In a real-world production scenario, you would inject the keys by modifying the HAQM Kinesis subscriber application so that you wouldn’t have the keys stored at each source. As always, limit the access-key and secret–access-key to least privilege, so have only read privileges from required HAQM S3 locations and no other privileges. In this case, the data is stored on a public HAQM S3 bucket for simplicity.

These put queries into the HAQM Kinesis stream, which are then picked up by the HAQM Kinesis-enabled subscriber applications and applied to each HAQM Redshift cluster. To validate this proces, log into both of the cluster end points directly with a SQL client. AWS provides step-by-step instructions for doing this. Once connected, you can run ‘SELECT COUNT(*) FROM CUSTOMER;’ to confirm that the data is populated in both clusters.

Accessing your Redshift Clusters for Analysis

Now that we are confident the clusters are being updated and synchronized, it is time to give our data analysts and business intelligence tools the ability to query the system. To do this, you can point the tools to a DNS record stored in Route 53 that is aware of both clusters. If they are in the same AZ, you’d probably want to do an evenly weighted round-robin similar to the following Clouldformation template

(Note: This AWS CloudFormation template will not apply since ‘example.com’ is not a valid domain name. You’d need to modify this to your own domain).

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Resources": {
    "dnsaz1clusterexamplecom": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneName": "example.com.",
        "RecordSets": [
          {
            "Name": "az1cluster.example.com.",
            "Type": "CNAME",
            "TTL": "300",
            "ResourceRecords": [
              "az1cluster.cbjhdm0le2vh.us-east-1.redshift.amazonaws.com"
            ]
          }
        ]
      }
    },
    "dnsaz2clusterexamplecom": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneName": "example.com.",
        "RecordSets": [
          {
            "Name": "az2cluster.example.com.",
            "Type": "CNAME",
            "TTL": "300",
            "ResourceRecords": [
              "az2cluster.cbjhdm0le2vh.us-east-1.redshift.amazonaws.com"
            ]
          }
        ]
      }
    },
    "dnsmyparallelclusterexamplecomt": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneName": "example.com.",
        "RecordSets": [
          {
            "Name": "myparallelcluster.example.com.",
            "Type": "CNAME",
            "SetIdentifier": "t",
            "Weight": "50",
            "ResourceRecords": [],
            "AliasTarget": {
              "HostedZoneId": "Z1HN6T29675QII",
              "DNSName": "az2cluster.example.com"
            }
          }
        ]
      }
    },
    "dnsmyparallelclusterexamplecomy": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneName": "example.com.",
        "RecordSets": [
          {
            "Name": "myparallelcluster.example.com.",
            "Type": "CNAME",
            "SetIdentifier": "y",
            "Weight": "50",
            "ResourceRecords": [ ],
            "AliasTarget": {
              "HostedZoneId": "Z1HN6T29675QII",
              "DNSName": "az1cluster.example.com"
            }
          }
        ]
      }
    }
  },
  "Description": "Evenly Weighted DNS entry myparallelcluster.example.com to two identical Redshift Clusters"
}

In this example, the DNS record myparallelcluster.example.com is evenly weighted to az1.cluster.example.com and az2cluster.example.com, which resolve to our HAQM Redshift cluster endpoints.

In addition to this weighting, create a Route 53 Health Check on each one of these clusters. That way, if something happens to them they are removed from the DNS entry and all future requests are funneled to the healthy clusters. AWS provides simple instructions for creating health checks in Route 53. The status check can be as simple as validating that the TCP port is responding. You could also set up a more sophisticated health check that validates off a website. This website could designate the cluster as unhealthy if it becomes out of sync for any reason, in which case Route53 would stop sending requests to it until it is healthy again. If your clusters become out of sync for some reason, you must decide the best way to get them back in sync. If it’s just a failed query you might be able to re-run that query. If it’s a more complicated problem, you could re-import a table to sync them up. Deciding whether to re-sync depends on the circumstances and the size of data.

If you are building a multi-region cluster, employ latency-based routing so that a user is routed to the best cluster possible. Some BI tools can point at just one database and don’t move over automatically the way this article describes; check the requirements of the tool you’re using.

In all cases, you want to make sure that only analysis queries are done through this method. Any modification to the database must go through HAQM Kinesis to ensure that it is applied to all parallel clusters.

Conclusion

By making HAQM Kinesis the ingestion point for all modification queries and using HAQM Route 53 to deliver users the closest healthy database, we have implemented an architecture that keeps multiple Redshift clusters synchronized. This gives you the power to create a cost-effective, highly resilient, durable, and geographically disperse petabyte-scale data warehouse.

If you have questions or suggestions, please comment below.

Do more with HAQM Redshift!

About the authors

Erik Swensson is an AWS Solutions Architect.

Patrick Shumate is an AWS Solutions Architect.

AWS Big Data Blog

Building Multi-AZ or Multi-Region HAQM Redshift Clusters

Configuring the HAQM Redshift Cluster

Accessing your Redshift Clusters for Analysis

Conclusion

About the authors

Resources

Follow