HAQM Redshift

HAQM Redshift FAQs

Page topics

General
14
HAQM SageMaker SQL analytics
7
Serverless
5
Data ingestion and loading
8
Data sharing
3
Scalability and concurrency
4
Security
5
Availability and durability
5
Querying and analytics
11
Zero-ETL integrations
8
Backup and restore
4
Monitoring and maintenance
2

General

Open all

What is HAQM Redshift?

Tens of thousands of customers use HAQM Redshift every day to run SQL analytics in the cloud, processing exabytes of data for business insights. Whether your growing data is stored in operational data stores, data lakes, streaming data services or third-party datasets, HAQM Redshift helps you securely access, combine, and share data with minimal movement or copying. HAQM Redshift is deeply integrated with AWS database, analytics, and machine learning services to employ Zero-ETL approaches or help you access data in place for near real-time analytics, build machine learning models in SQL, and enable Apache Spark analytics using data in Redshift. HAQM Redshift Serverless enables your engineers, developers, data scientists, and analysts to get started easily and scale analytics quickly in a zero-administration environment. With its Massively Parallel Processing (MPP) engine and architecture that separates compute and storage for efficient scaling, and machine learning driven performance innovations (for example: Automated Materialized Views), HAQM Redshift is built for scale and delivers up to 5x better price performance than other cloud data warehouses.

What are the top reasons customers choose HAQM Redshift?

Thousands of customers choose HAQM Redshift to accelerate their time to insights because it is a powerful analytics system that integrates well with database and machine learning services, is streamlined to use, and can become a central service to deliver on all their analytics needs. HAQM Redshift Serverless automatically provisions and scales data warehouse capacity to deliver high performance for demanding and unpredictable workloads. HAQM Redshift offers leading price performance for diverse analytics workloads, whether it is dashboarding, application development, data sharing, ETL (Extract, Transform, Load) jobs or several others. With tens of thousands of customers running analytics on terabytes to petabytes of data, HAQM Redshift optimizes real-world customer workload performance, based on fleet performance telemetry, and delivers performance that scales linearly to the workload, while keeping costs low. Performance innovations are available to customers at no additional cost. HAQM Redshift lets you get insights from running real-time and predictive analytics on all your data across your operational databases, data lake, data warehouse, streaming data, and third-party datasets. HAQM Redshift supports industry-leading security with built-in identity management and federation for single sign-on (SSO), multi-factor authentication, column-level access control, row-level security, role-based access control, HAQM Virtual Private Cloud (HAQM VPC), and faster cluster resize.

How does HAQM Redshift simplify data warehouse and analytics management?

HAQM Redshift is fully managed by AWS so you no longer need to worry about data warehouse management tasks such as hardware provisioning, software patching, setup, configuration, monitoring nodes and drives to recover from failures, or backups. AWS manages the work needed to set up, operate, and scale a data warehouse on your behalf, freeing you to focus on building your applications. HAQM Redshift Serverless automatically provisions and scales the data warehouse capacity to deliver high performance for demanding and unpredictable workloads, and you pay only for the resources you use. HAQM Redshift also has automatic tuning capabilities, and surfaces recommendations for managing your warehouse in Redshift Advisor. With Redshift Spectrum, HAQM Redshift manages all the computing infrastructure, load balancing, planning, scheduling, and execution of your queries on data stored in HAQM S3. HAQM Redshift enables analytics on all your data with deep integration into database services with features like HAQM Aurora Zero-ETL to HAQM Redshift and federated querying to access data in place from operational databases like HAQM RDS and your HAQM S3 data lake. Redshift enables streamlined data ingestion with no-code, automated data pipelines that ingest streaming data or HAQM S3 files automatically. Redshift is also integrated with AWS Data Exchange enabling users to find, subscribe to, and query third party datasets and combine with their data for comprehensive insights. With native integration into HAQM SageMaker, customers can stay right within their data warehouse and create, train, and build machine learning models in SQL. HAQM Redshift delivers on all your SQL analytics needs with up to 5x better price performance than other cloud data warehouses.

What are the deployment options for HAQM Redshift?

HAQM Redshift is a fully managed service and offers both provisioned and serverless options, making it more efficient for you to run and scale analytics without having to manage your data warehouse. You can spin up a new HAQM Redshift Serverless endpoint to automatically provision the data warehouse in seconds or you can choose the provisioned option for predictable workloads.

How do I get started with HAQM Redshift?

With just a few steps in the AWS Management Console, you can start querying data. You can take advantage of pre-loaded sample datasets, including benchmark datasets TPC-H, TPC-DS, and other sample queries to kick start analytics immediately. To get started with HAQM Redshift Serverless, choose “Try HAQM Redshift Serverless” and start querying data. Get started here.

How does the performance of HAQM Redshift compare to that of other data warehouses?

TPC-DS benchmark results show that HAQM Redshift provides the best price performance out of the box, even for a comparatively small 3 TB dataset. HAQM Redshift delivers up to 5x better price performance than other cloud data warehouses. This means that you can benefit from HAQM Redshift’s leading price performance from the start without manual tuning. Based on our performance fleet telemetry, we also know that most workloads are short query workloads (workloads that run in less than 1 second). For these workloads, the latest benchmarks demonstrate that HAQM Redshift offers up to 7x better price performance on high concurrency, low latency workloads than other cloud data warehouses. Learn more here.

Can I get help to learn more about and onboard to HAQM Redshift?

Yes, HAQM Redshift specialists are available to answer questions and provide support. Contact us and you’ll hear back from us in one business day to discuss how AWS can help your organization.

What is HAQM Redshift managed storage?

HAQM Redshift managed storage is available with serverless and RA3 node types and lets you scale and pay for compute and storage independently so you can size your cluster based only on your compute needs. It automatically uses high-performance SSD-based local storage as tier-1 cache and takes advantage of optimizations such as data block temperature, data block age, and workload patterns to deliver high performance while scaling storage automatically to HAQM S3 when needed without requiring any action.

How do I use HAQM Redshift’s managed storage?

If you are already using HAQM Redshift Dense Storage or Dense Compute nodes, you can use Elastic Resize to upgrade your existing clusters to the new compute instance RA3. HAQM Redshift Serverless and clusters using the RA3 instance automatically use Redshift-managed storage to store data. No other action outside of using HAQM Redshift Serverless or RA3 instances is required to use this capability.

How can I run queries from Redshift for the data stored in the AWS Data Lake?

HAQM Redshift Spectrum is a feature of HAQM Redshift that lets you run queries against your data lake in HAQM S3, with no data loading or ETL required. When you issue an SQL query, it goes to the HAQM Redshift endpoint, which generates and optimizes a query plan. HAQM Redshift determines what data is local and what is in HAQM S3, generates a plan to minimize the amount of S3 data that must be read, and requests HAQM Redshift Spectrum workers out of a shared resource pool to read and process data from HAQM S3.

When should I consider using RA3 instances?

Consider choosing RA3 node types in these cases:

You need the flexibility to scale and pay for compute separate from storage.
You query a fraction of your total data.
Your data volume is growing rapidly or is expected to grow rapidly.
You want the flexibility to size the cluster based only on your performance needs.

As the scale of data continues to grow, reaching petabytes, the amount of data you ingest into your HAQM Redshift data warehouse is also growing. You might be looking for ways to cost-effectively analyze all your data. With new HAQM Redshift RA3 instances with managed storage, you can choose the number of nodes based on your performance requirements, and pay only for the managed storage that you use. This gives you the flexibility to size your RA3 cluster based on the amount of data you process daily without increasing your storage costs. Built on the AWS Nitro System, RA3 instances with managed storage use high performance SSDs for your hot data and HAQM S3 for your cold data, providing ease of use, cost-effective storage, and fast query performance.

What feature can I use for location-based analytics?

HAQM Redshift spatial provides location-based analytics for rich insights into your data. It seamlessly integrates spatial and business data to provide analytics for decision making. HAQM Redshift launched native spatial data processing support in November 2019, with a polymorphic data type GEOMETRY and several key SQL spatial functions. We now support GEOGRAPHY data type, and our library of SQL spatial functions has grown to 80. We support all the common spatial data types and standards, including Shapefiles, GeoJSON, WKT, WKB, eWKT, and eWKB. To learn more, visit the documentation page or the HAQM Redshift spatial tutorial page.

How does Athena’s SQL support compare to Redshift, and how do I choose between the two services?

HAQM Athena and HAQM Redshift Serverless address different needs and use cases even if both services are serverless and enable SQL users.

With its Massively Parallel Processing (MPP) architecture that separates storage and compute and machine learning led automatic optimization capabilities, a data warehouse like HAQM Redshift, whether it's serverless or provisioned, is a great choice for customers that need the best price performance at any scale for complex BI and analytics workloads. Customers can use HAQM Redshift as a central component of their data architecture with deep integrations available to access data in place or ingest or move data easily into the warehouse for high performance analytics, through ZeroETL and no-code methods. Customers can access data stored in HAQM S3, operational databases like Aurora and HAQM RDS, third party data warehouses through the integration with AWS Data Exchange, and combine with data stored in the HAQM Redshift data warehouse for analytics. They can get data warehousing started easily and conduct machine learning on top of all this data.

HAQM Athena is well suited for interactive analytics and data exploration of data in your data lake or any data source through an extensible connector framework (includes 30-plus out-of-box connectors for applications and on premises or other cloud analytics systems) without worrying about ingesting or processing data. HAQM Athena is built on open-source engines and frameworks such as Spark, Presto, and Apache Iceberg, giving customers the flexibility to use Python or SQL or work on open data formats. If customers want to do interactive analytics using open-source frameworks and data formats, HAQM Athena is a great place to start.

Does size flexibility apply to Redshift Reserved node?

No, Redshift Reserved instance are not flexible; they only apply to the exact node type that you reserve.

HAQM SageMaker SQL analytics

Open all

What are the benefits of using HAQM Redshift in SageMaker for SQL analytics?

SageMaker simplifies SQL analytics by providing a comprehensive, user-friendly platform that connects multiple data sources and streamlines data exploration. With a flexible notebook-style interface, you can access data from HAQM S3, HAQM Redshift, and other data sources, write and run queries across different engines, and directly create visualizations within the tool. The platform automatically manages your data's metadata, making it easier to understand and discover information. By integrating seamlessly with other AWS services, the platform allows you to go beyond traditional SQL analysis, turning your data into actionable insights with minimal technical complexity.

Do I have to migrate my data from HAQM S3 or existing HAQM Redshift data warehouse to use SageMaker for SQL analytics?

No, you don't need to migrate your data in order to use SageMaker for SQL analytics. You can directly discover and query data from multiple sources, including HAQM S3 (AWS Glue Data Catalog and HAQM S3 table buckets), HAQM Redshift (serverless and provisioned), and 13 additional federated data sources compatible with SQL engineering workflows. HAQM SageMaker Lakehouse seamlessly connects to your current data, so you can focus on insights instead of spending time moving information around. In just a few quick steps, you'll be able to explore your data, run queries, and uncover valuable business information without technical hassles.

How do I load data and get started using SageMaker for SQL analytics?

To get started, SageMaker offers two ways to bring your data into the platform for SQL analytics. If you store your information in HAQM S3, SageMaker SQL allows you to run queries directly on those data with the data lake. Alternatively, you can upload load data into your data warehouse by running COPY commands. If you have local data on your desktop, SageMaker allows you to upload your data files straight from your own computer by dragging and dropping data files into the SageMaker platform. Additionally, you can use zero-ETL to bring data from your operational data warehouse. The entire process is designed to remove technical barriers, allowing you to focus on discovering insights rather than wrestling with complex data-loading processes.

What is the experience of SageMaker query books?

HAQM SageMaker unified studio (preview), offers a powerful, user-friendly notebook-style interface for comprehensive SQL analytics. You can write and run SQL code in separate cells, create charts and visualizations, and explore unified data from different sources such as HAQM S3, HAQM Redshift, and various federated sources through SageMaker Lakehouse. The platform also provides helpful features like auto-complete and syntax checking to aid in your SQL authoring. You can also use generative AI functionality with HAQM Q generative SQL, which provides SQL code recommendations using natural language. SageMaker is designed to make SQL analytics more intuitive, flexible, and accessible for all data users.

What is the pricing model for SQL analytics in SageMaker?

There are no additional costs to use the SQL editor in SageMaker. You pay only for your usage of available compute engines such as Athena and HAQM Redshift.

What is the SLA for SQL Analytics in HAQM SageMaker?

SQL Analytics Service Level Agreements (SLAs) in HAQM SageMaker are directly tied to the SLAs of the underlying SQL engines: HAQM Redshift and Athena. Customers can find detailed service commitment information on the respective service level agreement pages for HAQM Redshift and Athena.

Serverless

Open all

What is HAQM Redshift Serverless?

HAQM Redshift Serverless is a serverless option of HAQM Redshift that makes it more efficient to run and scale analytics in seconds without the need to set up and manage data warehouse infrastructure. With Redshift Serverless, any user—including data analysts, developers, business professionals, and data scientists—can get insights from data by simply loading and querying data in the data warehouse.

How do I get started with HAQM Redshift Serverless

With just a few steps in the AWS Management Console, you can choose "configure HAQM Redshift Serverless" and begin querying data. You can take advantage of preloaded sample datasets, such as weather data, census data, and benchmark datasets, along with sample queries to kick start analytics immediately. You can create databases, schemas, tables, and load data from HAQM S3, HAQM Redshift data shares, or restore from an existing Redshift provisioned cluster snapshot. You can also directly query data in open formats (such as Parquet or ORC) in the HAQM S3 data lake, or query data in operational databases, such as HAQM Aurora and HAQM RDS PostgreSQL and MySQL. See the Getting Started Guide.

What are the benefits of using HAQM Redshift Serverless?

If you don't have data warehouse management experience, you don’t have to worry about setting up, configuring, managing clusters or tuning the warehouse. You can focus on deriving meaningful insights from your data or delivering on your core business outcomes through data. You pay only for what you use, keeping costs manageable. You continue to benefit from all of HAQM Redshift’s top-notch performance, rich SQL features, seamless integration with data lakes and operational data warehouses, and built-in predictive analytics and data sharing capabilities. If you need fine-grained control of your data warehouse, you can provision Redshift clusters.

How does HAQM Redshift Serverless work with other AWS services?

You can continue to use all the rich analytics functionality of HAQM Redshift, such as complex joins, direct queries to data in the HAQM S3 data lake and operational databases, materialized views, stored procedures, semistructured data support, and ML, as well as high performance at scale. All the related services that HAQM Redshift integrates with (such as HAQM Kinesis, AWS Lambda, HAQM QuickSight, HAQM SageMaker, HAQM EMR, AWS Lake Formation, and AWS Glue) continue to work with HAQM Redshift Serverless.

What use cases can I handle with HAQM Redshift Serverless?

You can continue to run all analytics use cases. With a simple getting started workflow, automatic scaling, and the ability to pay for use, the HAQM Redshift Serverless experience now makes it even more efficient and more cost-effective to run development and test environments that must get started quickly, ad-hoc business analytics, workloads with varying and unpredictable compute needs, and intermittent or sporadic workloads.

Data ingestion and loading

Open all

How do I load data into my HAQM Redshift data warehouse?

You can load data into HAQM Redshift from a range of data sources including HAQM S3, HAQM RDS, HAQM DynamoDB, HAQM EMR, AWS Glue, AWS Data Pipeline and or any SSH-enabled host on HAQM EC2 or on-premises. HAQM Redshift attempts to load your data in parallel into each compute node to maximize the rate at which you can ingest data into your data warehouse cluster. Clients can connect to HAQM Redshift using ODBC or JDBC and issue 'insert' SQL commands to insert the data. Please note this is slower than using S3 or DynamoDB since those methods load data in parallel to each compute node while SQL insert statements load through the single leader node. For more details on loading data into HAQM Redshift, please view our Getting Started Guide.

How is Redshift auto-copy different than the copy command?

Redshift auto-copy provides the ability to automate copy statements by tracking HAQM S3 folders and ingesting new files without customer intervention. Without auto-copy, a copy statement immediately starts the file ingestion process for existing files. Auto-copy extends the existing copy command and provides the ability to

Automate file ingestion process by monitoring specified HAQM S3 paths for new files
Re-use copy configurations, reducing the need to create and run new copy statements for repetitive ingestion tasks
Keep track of loaded files to avoid data duplication.

How do I get started with Redshift auto-copy?

To get started, customers should have an HAQM S3 folder, which can be accessed by their Redshift cluster/serverless endpoint using associated IAM roles, and create a Redshift table to be used as a target. Once an HAQM S3 path and the Redshift table are ready, customers can create a copy job by using the copy command. Once the copy job is created, Redshift starts tracking the specified HAQM S3 path behind the scenes and initiates the user defined copy statements to automatically copy new files into the target table.

What are the use cases for HAQM Redshift integration for Apache Spark?

The key use cases include:

Customers using HAQM EMR and AWS Glue to run Apache Spark jobs that access and load data into HAQM Redshift as part of the data ingestion and transformation pipelines (batch and streaming)
Customers using HAQM SageMaker to perform machine learning using Apache Spark and must access data stored in HAQM Redshift for feature engineering and transformation.
HAQM Athena customers using Apache Spark to perform interactive analysis on data in HAQM Redshift.

What are the benefits of HAQM Redshift integration for Apache Spark?

The benefits of this integration are

Ease of use for getting started and running Apache Spark applications on data in HAQM Redshift without having to worry about manual steps involved to setup and maintain uncertified versions of the Spark;
Convenience of using Apache Spark from various AWS services such as HAQM EMR, AWS Glue, HAQM Athena, and HAQM SageMaker with HAQM Redshift with minimal configuration;
Improved performance while running Apache Spark applications on HAQM Redshift.

When should I use HAQM Aurora Zero-ETL to HAQM Redshift instead of Federated Querying?

HAQM Aurora Zero-ETL to HAQM Redshift enables HAQM Aurora and HAQM Redshift customers to run near real-time analytics and machine learning on petabytes of transactional data by offering a fully managed solution for making transactional data from HAQM Aurora available in HAQM Redshift within seconds of being written. With HAQM Aurora Zero-ETL to HAQM Redshift, customers simply choose the HAQM Aurora tables containing the data they want to analyze with HAQM Redshift, and the feature seamlessly replicates the schema and data into HAQM Redshift. It reduces the need for customers to build and manage complex data pipelines, so they can instead focus on improving their applications. With HAQM Aurora Zero-ETL to HAQM Redshift, customers can replicate data from multiple HAQM Aurora database clusters into the same HAQM Redshift instance to derive comprehensive insights across several applications, while also consolidating their core analytics assets, gaining significant cost savings and operational efficiencies. With HAQM Aurora Zero-ETL to HAQM Redshift, customers can also access the core analytics and machine learning capabilities of HAQM Redshift such as materialized views, data sharing, and federated access to multiple data stores and data lakes. This enables customers to combine near real-time and core analytics to effectively derive time sensitive insights that inform business decisions. Furthermore, customers use HAQM Aurora for transactions and HAQM Redshift for analytics, so there are no shared compute resources, yielding a performant and operationally stable solution.

How does HAQM Aurora Zero-ETL to HAQM Redshift relate to/work with other AWS services?

HAQM Aurora Zero-ETL Integration with HAQM Redshift offers seamless integration between the two services for transactional analytics.

How does Streaming Ingestion work?

Streaming data are different from traditional database tables in that when you query a stream, you are capturing the evolution of a time-varying relation. Tables, on the other hand, capture a point-in-time snapshot of this time-varying relation. HAQM Redshift’s customers are accustomed to operating on regular tables and perform downstream processing (i.e. transformations) of data using a traditional batch model, for example “ELT”. We provide a method to use Redshift Materialized Views (MVs) so that customers can easily materialize a point-in-time view of the stream, as accumulated up to the time it is queried, as fast as possible to support ELT workflows.

Data sharing

Open all

What are cross-database queries in HAQM Redshift?

With cross-database queries, you can seamlessly query and join data from any Redshift database that you have access to, regardless of which database you are connected to. This can include databases local on the cluster and also shared datasets made available from remote clusters. Cross-database queries give you flexibility to organize data as separate databases to support multi-tenant configurations.

Who are the primary users of AWS Data Exchange?

AWS Data Exchange makes it more efficient for AWS customers to securely exchange and use third-party data in AWS. Data analysts, product managers, portfolio managers, data scientists, quants, clinical trial technicians, and developers in nearly every industry would like access to more data to drive analytics, train ML models, and make data-driven decisions. But there is no one place to find data from multiple providers and no consistency in how providers deliver data, leaving them to deal with a mix of shipped physical media, FTP credentials, and bespoke API calls. Conversely, many organizations would like to make their data available for research or commercial purposes, but it’s too hard and expensive to build and maintain data delivery, entitlement, and billing technology, which further depresses the supply of valuable data.

Scalability and concurrency

Open all

How do I scale the size and performance of my HAQM Redshift data warehouse cluster?

HAQM Redshift Serverless automatically provisions data warehouse capacity and intelligently scales the underlying resources. HAQM Redshift Serverless adjusts capacity in seconds to deliver consistently high performance and simplified operations for even the most demanding and volatile workloads. With the Concurrency Scaling feature, you can support unlimited concurrent users and concurrent queries, with consistently fast query performance. When concurrency scaling is enabled, HAQM Redshift automatically adds cluster capacity when your cluster experiences increase in query queueing.

For manual scaling, If you would like to increase query performance or respond to CPU, memory, or I/O overutilization, you can increase the number of nodes within your data warehouse cluster using Elastic Resize through the AWS Management Console or the ModifyCluster API. When you modify your data warehouse cluster, your requested changes will be applied immediately. Metrics for compute utilization, storage utilization, and read/write traffic to your Redshift data warehouse cluster are available free of charge through the AWS Management Console or HAQM CloudWatch APIs. You can also add user-defined metrics through HAQM CloudWatch custom metric functionality.

With HAQM Redshift Spectrum, you can run multiple Redshift clusters accessing the same data in HAQM S3. You can use different clusters for different use cases. For example, you can use one cluster for standard reporting and another for data science queries. Your marketing team can use their own clusters different from your operations team. Redshift Spectrum automatically distributes the execution of your query to several Redshift Spectrum workers out of a shared resource pool to read and process data from HAQM S3, and pulls results back into your Redshift cluster for any remaining processing.

Will my data warehouse cluster remain available during scaling?

It depends. When you using the Concurrency Scaling feature, the cluster is fully available for read and write during concurrency scaling. With Elastic resize, the cluster is unavailable for four to eight minutes of the resize period. With the Redshift RA3 storage elasticity in managed storage, the cluster is fully available and data is automatically moved between managed storage and compute nodes.

What is Elastic Resize and how is it different from Concurrency Scaling?

Elastic Resize adds or removes nodes from a single Redshift cluster within minutes to manage its query throughput. For example, an ETL workload for certain hours in a day or month-end reporting might need additional HAQM Redshift resources to complete on time. Concurrency Scaling adds additional cluster resources to increase the overall query concurrency.

Can I access the Concurrency Scaling clusters directly?

No. Concurrency Scaling is a massively scalable pool of HAQM Redshift resources and customers do not have direct access.

Security

Open all

How does HAQM Redshift keep my data secure?

HAQM Redshift supports industry-leading security with built-in identity management and federation for single sign-on (SSO), multi-factor authentication, column-level access control, row-level security, role-based access control, and HAQM Virtual Private Cloud (HAQM VPC). With HAQM Redshift, your data is encrypted in transit and at rest. All HAQM Redshift security features are offered out-of-the-box at no additional cost to satisfy the most demanding security, privacy, and compliance requirements. You get the benefit of AWS supporting more security standards and compliance certifications than any other provider, including ISO 27001, SOC, HIPAA/HITECH, and FedRAMP.

Does Redshift support granular access controls?

Yes, HAQM Redshift provides support for role-based access control. Row-level access control allows you to assign one or more roles to a user, and assign system and object permissions by role. You can use out-of-the-box system roles–root user, dba, operator, and security admins, or create your own roles.

Does HAQM Redshift support data masking or data tokenization?

AWS Lambda user-defined functions (UDFs) enable you to use an AWS Lambda function as a UDF in HAQM Redshift and invoke it from Redshift SQL queries. This functionality enables you to write custom extensions for your SQL query to achieve tighter integration with other services or third-party products. You can write Lambda UDFs to enable external tokenization, data masking, identification or de-identification of data by integrating with vendors like Protegrity, and protect or unprotect sensitive data based on a user’s permissions and groups, in query time.

With support for dynamic data masking, customers can easily protect their sensitive data and control granular access by managing Data Masking policies. Suppose you have applications that have multiple users and objects with sensitive data that cannot be exposed to all the users. You have requirements to provide a different granular security level that you want to give different groups of users. Redshift Dynamic Data Masking is configurable to allow customers to define consistent, format-preserving, and irreversible masked data values. Once the feature is GA, you begin using it immediately. The security admins can create and apply policies with just a few commands.

Does HAQM Redshift support single sign-on?

Yes. Customers who want to use their corporate identity providers such as Microsoft Azure Active Directory, Active Directory Federation Services, Okta, Ping Federate, or other SAML compliant identity providers can configure HAQM Redshift to provide single sign-on. You can sign on to HAQM Redshift cluster with Microsoft Azure Active Directory (AD) identities. This allows you to be able to sign on to Redshift without duplicating Azure Active Directory identities in Redshift.

Does HAQM Redshift support multi-factor authentication (MFA)?

Yes. You can use multi-factor authentication (MFA) for additional security when authenticating to your HAQM Redshift cluster.

Availability and durability

Open all

What happens to my data warehouse cluster availability and data durability in the event of individual node failure?

HAQM Redshift will automatically detect and replace a failed node in your data warehouse cluster. On Dense Compute (DC) and Dense Storage (DS2) clusters, the data is stored on the compute nodes to ensure high data durability. When a node is replaced, the data is refreshed from the mirror copy on the other node. RA3 clusters and Redshift serverless are not impacted the same way since the data is stored in HAQM S3 and the local drive is just used as a data cache. The data warehouse cluster will be unavailable for queries and updates until a replacement node is provisioned and added to the DB. HAQM Redshift makes your replacement node available immediately and loads your most frequently accessed data from HAQM S3 first to allow you to resume querying your data as quickly as possible. Single node clusters do not support data replication. In the event of a drive failure, you must restore the cluster from snapshot on S3. We recommend using at least two nodes for production.

What happens to my data warehouse cluster availability and data durability if my data warehouse cluster's Availability Zone (AZ) has an outage?

If your HAQM Redshift data warehouse is a single-AZ deployment and the cluster's Availability Zone becomes unavailable, then HAQM Redshift will automatically move your cluster to another AWS Availability Zone (AZ) without any data loss or application changes. To activate this, you must enable the relocation capability in your cluster configuration settings.

Why should I use a Redshift Multi-AZ deployment?

Unlike single-AZ deployments, customers can now improve availability of Redshift by running their data warehouse in a multi-AZ deployment. A multi-AZ deployment allows you to run your data warehouse in multiple AWS Availability Zones (AZ) simultaneously and continue operating in unforeseen failure scenarios. No application changes are required to maintain business continuity since the Multi-AZ deployment is managed as a single data warehouse with one endpoint. Multi-AZ deployments reduce recovery time by guaranteeing capacity to automatically recover and are intended for customers with business-critical analytics applications that require the highest levels of availability and resiliency to AZ failures. This also allows customers to implement a solution that is more compliant with the recommendations of the Reliability Pillar of the AWS Well-Architected Framework. To learn more about HAQM Redshift Multi-AZ refer here.

What is RPO and RTO? What RPO and RTO are supported with a Multi-AZ deployment?

RPO is an acronym for Recovery Point Objective and is a term to describe the data recency guarantee in the event of failures. RPO is the maximum acceptable amount of time since the last data recovery point. This determines what is considered an acceptable loss of data between the last recovery point and the interruption of service. Redshift Multi-AZ supports RPO = 0 meaning data is guaranteed to be current and up to date in the event of a failure. Our pre-launch tests found that RTO with HAQM Redshift Multi-AZ deployments is under 60 seconds or less in the unlikely case of an AZ failure.

How does Redshift Multi-AZ compare to the existing Redshift Relocation feature?

Redshift Relocation is enabled by default on all new RA3 clusters and serverless endpoints, which allows a data warehouse to be re-started in another AZ in the event of a large-scale outage, without any data loss or additional cost. While using Relocate is free, the limitations are that it is a best-effort approach subject to resource availability in the AZ being recovered in and Recovery Time Objective (RTO) can be impacted by other issues related to starting up a new cluster. This can result in recovery times between 10 and 60 minutes. Redshift Multi-AZ supports high availability requirements by delivering an RTO measured in tens of seconds and offers guaranteed continued operation since it will not be subject to capacity limitations or other potential issues creating a new cluster.

Querying and analytics

Open all

Are HAQM Redshift and Redshift Spectrum compatible with my preferred business intelligence software package and ETL tools?

Yes, HAQM Redshift uses industry-standard SQL and is accessed using standard JDBC and ODBC drivers. You can download HAQM Redshift custom JDBC and ODBC drivers from the Connect Client tab of the Redshift Console. We have validated integrations with popular BI and ETL vendors, a number of which are offering free trials to help you get started loading and analyzing your data. You can also go to the AWS Marketplace to deploy and configure solutions designed to work with HAQM Redshift in minutes.

HAQM Redshift Spectrum supports all HAQM Redshift client tools. The client tools can continue to connect to the HAQM Redshift cluster endpoint using ODBC or JDBC connections. No changes are required.

You use exactly the same query syntax and have the same query capabilities to access tables in Redshift Spectrum as you have for tables in the local storage of your Redshift cluster. External tables are referenced using the schema name defined in the CREATE EXTERNAL SCHEMA command where they were registered.

What data formats and compression formats does HAQM Redshift Spectrum support?

HAQM Redshift Spectrum currently supports many open-source data formats, including Avro, CSV, Grok, HAQM Ion, JSON, ORC, Parquet, RCFile, RegexSerDe, Sequence, Text, and TSV.

HAQM Redshift Spectrum currently supports Gzip and Snappy compression.

What happens if a table in my local storage has the same name as an external table?

Just like with local tables, you can use the schema name to pick exactly which one you mean by using schema_name.table_name in your query.

I use a Hive Metastore to store metadata about my S3 data lake. Can I use Redshift Spectrum?

Yes. The CREATE EXTERNAL SCHEMA command supports Hive Metastores. We do not currently support DDL against the Hive Metastore.

How do I get a list of all external database tables created in my cluster?

You can query the system table SVV_EXTERNAL_TABLES to get that information.

Does Redshift support the ability to use Machine Learning with SQL?

Yes, the HAQM Redshift ML feature makes it easy for SQL users to create, train, and deploy machine learning (ML) models using familiar SQL commands. HAQM Redshift ML allows you to leverage your data in HAQM Redshift with HAQM SageMaker, a fully managed ML service. HAQM Redshift supports both unsupervised learning (K-Means) and supervised learning (Autopilot, XGBoost, MLP algorithms). You can also use AWS Language AI services to translate, redact, and analyze text fields in SQL queries with prebuilt Lambda UDF functions - see blog post.

Does HAQM Redshift provide an API to query data?

HAQM Redshift provides a Data API that you can use to painlessly access data from HAQM Redshift with all types of traditional, cloud-native, and containerized, serverless web services-based and event-driven applications. The Data API simplifies access to HAQM Redshift because you don’t need to configure drivers and manage database connections. Instead, you can run SQL commands to an HAQM Redshift cluster by simply calling a secured API endpoint provided by the Data API. The Data API takes care of managing database connections and buffering data. The Data API is asynchronous, so you can retrieve your results later. Your query results are stored for 24 hours.

What types of credentials can I use with HAQM Redshift Data API?

The Data API supports both IAM credentials and using a secret key from AWS Secrets Manager. The Data API federates AWS Identity and Access Management (IAM) credentials so you can use identity providers like Okta or Azure Active Directory or database credentials stored in Secrets Manager without passing database credentials in API calls.

Can I use HAQM Redshift Data API from AWS CLI?

Yes, you can use the Data API from AWS CLI using the aws redshift-data command line option.

Is the Redshift Data API integrated with other AWS services?

You can use the Data API from other services such as AWS Lambda, AWS Cloud9, AWS AppSync and HAQM EventBridge.

Do I have to pay separately for using the HAQM Redshift Data API?

No, there is no separate charge for using the Data API.

Zero-ETL integrations

Open all

What is zero-ETL?

Zero-ETL is a set of fully managed integrations by AWS that removes or minimizes the need to build extract, transform, and load (ETL) data pipelines. Zero-ETL makes data available in SageMaker Lakehouse and HAQM Redshift from multiple operational sources, transactional sources, and enterprise applications. ETL is the process of combining, cleaning, and normalizing data from different sources to get it ready for analytics, AI, and ML workloads. Traditional ETL processes are time-consuming and complex to develop, maintain, and scale. Instead, zero-ETL integrations facilitate point-to-point data movement without the need to create and operate ETL data pipelines.

Visit What is zero-ETL? to learn more.

What ETL challenges does zero-ETL integration solve?

The zero-ETL integrations solve many of the existing data movement challenges in traditional ETL processes, including:

Increased system complexity due to intricate data-mapping rules, error handling, and security requirements
Additional costs from growing data volumes, infrastructure upgrades, and maintenance
Delayed time to analytics, AI, and ML due to custom code development and deployment, causing missed opportunities for real-time use cases.

What are the benefits of zero-ETL?

Increased agility: Zero-ETL simplifies data architecture and reduces data-engineering efforts. It allows for the inclusion of new data sources without the need to reprocess large amounts of data. This flexibility enhances agility, supporting data-driven decision-making and rapid innovation.
Cost-efficiency: Zero-ETL uses data integration technologies that are cloud-native and scalable, allowing businesses to optimize costs based on actual usage and data-processing needs. Organizations reduce infrastructure costs, development efforts, and maintenance overheads.
Fast time to insights: Traditional ETL processes often involve periodic batch updates, resulting in delayed data availability. Zero-ETL integrations, on the other hand, provides near real-time data access, to help provide fresher data for analytics, AI/ML, and reporting. You get more accurate and timely insights for use cases like real-time dashboards, optimized gaming experience, data-quality monitoring, and customer behavior analysis. Organizations can make data-driven predictions with more confidence, improve customer experiences, and promote data-driven insights across the business.

What zero-ETL integrations are available from AWS today?

At re:Invent 2024, we announced the following four zero-ETL integrations:

HAQM SageMaker Lakehouse and HAQM Redshift support for zero-ETL integrations from applications
HAQM DynamoDB zero-ETL integration with HAQM SageMaker Lakehouse
HAQM OpenSearch Service zero-ETL integration with HAQM CloudWatch Logs
HAQM OpenSearch Service zero-ETL integration with HAQM Security Lake

Since the launch of zero-ETL integrations, we have introduced seven integrations:

What is the pricing model for zero-ETL?

To learn more about pricing, visit the HAQM Redshift, AWS Glue, and SageMaker Lakehouse pricing pages.

Where can I learn more about zero-ETL and this new feature?

To learn more about zero-ETL, visit What is zero-ETL?

How are schema changes handled with zero-ETL integration?

Here are some key points on how schema changes are handled:

DDL statements, such as CREATE TABLE, ALTER TABLE, DROP TABLE and so on are automatically replicated from Aurora to HAQM Redshift.
The integration makes the necessary checks and adjustments in HAQM Redshift tables for replicated schema changes. For example, adding a column in Aurora will add the column in HAQM Redshift.
The replication and schema changes automatically happen in real time with minimal lag between source and target databases.
Schema consistency is maintained even as DML changes occur in parallel to DDL changes.

How do I run transformations on my data using zero-ETL integration?

You can create materialized views in your local HAQM Redshift database to transform data replicated through zero-ETL integration. Connect to your local database and use cross-database queries to access the destination databases. You can either use fully qualified object names with three-part notation (destination-database-name.schema-name.table-name) or create an external schema referencing the destination database and schema pair and use two-part notation (external-schema-name.table-name).

Backup and restore

Open all

How does HAQM Redshift backup my data? How do I restore my cluster from a backup?

HAQM Redshift RA3 clusters and HAQM Redshift Serverless use Redshift Managed Storage, which always has the latest copy of the data available. DS2 and DC2 clusters mirror the data on the cluster to ensure the latest copy is available in the event of a failure. Backups are automatically created on all Redshift cluster types and retained for 24 hours, and on serverless recovery points are provided for the past 24 hours.

You can also create your own backups that can be retained indefinitely. These backups can be created at any time, and the HAQM Redshift automated backups or HAQM Redshift Serverless recovery points can be converted into a user backup for longer retention.

HAQM Redshift can also asynchronously replicate your snapshots or recovery points to HAQM S3 in another Region for disaster recovery.

On a DS2 or DC2 cluster, free backup storage is limited to the total size of storage on the nodes in the data warehouse cluster and only applies to active data warehouse clusters. For example, if you have total data warehouse storage of 8 TB, we will provide at most 8 TB of backup storage at no additional charge. If you would like to extend your backup retention period beyond one day, you can do so using the AWS Management Console or the HAQM Redshift APIs. For more information on automated snapshots, please refer to the HAQM Redshift Management Guide.

HAQM Redshift only backs up data that has changed, so most snapshots use only a small amount of your free backup storage. When you need to restore a backup, you have access to all the automated backups within your backup retention window. Once you choose a backup from which to restore, we will provision a new data warehouse cluster and restore your data to it.

How do I manage the retention of my automated backups and snapshots?

You can use the AWS Management Console or ModifyCluster API to manage the period of time your automated backups are retained by modifying the RetentionPeriod parameter. If you wish to turn off automated backups altogether, you can set up the retention period to 0 (not recommended).

What happens to my backups if I delete my data warehouse cluster?

When you delete a data warehouse cluster, you have the ability to specify whether a final snapshot is created upon deletion. This enables a restore of the deleted data warehouse cluster at a later date. All previously created manual snapshots of your data warehouse cluster will be retained and billed at standard HAQM S3 rates, unless you choose to delete them.

How do I back up and restore my HAQM Redshift data using AWS Backup?

You can use AWS Backup to automate backup and restore of both HAQM Redshift Serverless and HAQM Redshift provisioned cluster snapshots, along with other AWS services for compute, storage, and database. Using AWS Backup’s integration with AWS Organizations, you can centrally create and manage immutable backups across all your accounts, standardizing data protection across your organization.

You can protect your HAQM Redshift clusters and Serverless data warehouses using the AWS Backup console or programmatically using API or CLI. These clusters can be backed up on a regular schedule as part of a backup plan, or they can be backed up as needed via on-demand backup. You can restore a single table (also known as item-level restore) or an entire cluster/namespace.

To learn more about AWS Backup support for HAQM Redshift, visit the AWS Backup product page and documentation.

Monitoring and maintenance

Open all

How do I monitor the performance of my HAQM Redshift data warehouse cluster?

Metrics for compute utilization, storage utilization, and read/write traffic to your HAQM Redshift data warehouse cluster are available free of charge through the AWS Management Console or HAQM CloudWatch APIs. You can also add additional, user-defined metrics through HAQM CloudWatch’s custom metric functionality. The AWS Management Console provides a monitoring dashboard that helps you monitor the health and performance of all your clusters. HAQM Redshift also provides information on query and cluster performance through the AWS Management Console. This information enables you to see which users and queries are consuming the most system resources to diagnose performance issues by viewing query plans and execution statistics. In addition, you can see the resource utilization on each of your compute nodes to ensure that you have data and queries that are well-balanced across all nodes.

What is a maintenance window? Will my data warehouse cluster be available during software maintenance?

HAQM Redshift periodically performs maintenance to apply fixes, enhancements and new features to your cluster. You can change the scheduled maintenance windows by modifying the cluster, either programmatically or by using the Redshift Console. During these maintenance windows, your HAQM Redshift cluster is not available for normal operations. For more information about maintenance windows and schedules by Region, see Maintenance Windows in the HAQM Redshift Management Guide.

HAQM Redshift FAQs

Page topics

General

What is HAQM Redshift?

What are the top reasons customers choose HAQM Redshift?

How does HAQM Redshift simplify data warehouse and analytics management?

What are the deployment options for HAQM Redshift?

How do I get started with HAQM Redshift?

How does the performance of HAQM Redshift compare to that of other data warehouses?

Can I get help to learn more about and onboard to HAQM Redshift?

What is HAQM Redshift managed storage?

How do I use HAQM Redshift’s managed storage?

How can I run queries from Redshift for the data stored in the AWS Data Lake?

When should I consider using RA3 instances?

What feature can I use for location-based analytics?

How does Athena’s SQL support compare to Redshift, and how do I choose between the two services?

Does size flexibility apply to Redshift Reserved node?

HAQM SageMaker SQL analytics

What are the benefits of using HAQM Redshift in SageMaker for SQL analytics?

Do I have to migrate my data from HAQM S3 or existing HAQM Redshift data warehouse to use SageMaker for SQL analytics?

How do I load data and get started using SageMaker for SQL analytics?

What is the experience of SageMaker query books?

How can I share my SQL queries or data models in SageMaker?

What is the pricing model for SQL analytics in SageMaker?

What is the SLA for SQL Analytics in HAQM SageMaker?

Serverless

What is HAQM Redshift Serverless?

How do I get started with HAQM Redshift Serverless

What are the benefits of using HAQM Redshift Serverless?

How does HAQM Redshift Serverless work with other AWS services?

What use cases can I handle with HAQM Redshift Serverless?

Data ingestion and loading

How do I load data into my HAQM Redshift data warehouse?

How is Redshift auto-copy different than the copy command?

How do I get started with Redshift auto-copy?

What are the use cases for HAQM Redshift integration for Apache Spark?

What are the benefits of HAQM Redshift integration for Apache Spark?

When should I use HAQM Aurora Zero-ETL to HAQM Redshift instead of Federated Querying?

How does HAQM Aurora Zero-ETL to HAQM Redshift relate to/work with other AWS services?

How does Streaming Ingestion work?

Data sharing

What are the use cases for data sharing?

What are cross-database queries in HAQM Redshift?

Who are the primary users of AWS Data Exchange?

Scalability and concurrency

How do I scale the size and performance of my HAQM Redshift data warehouse cluster?

Will my data warehouse cluster remain available during scaling?

What is Elastic Resize and how is it different from Concurrency Scaling?

Can I access the Concurrency Scaling clusters directly?

Security

How does HAQM Redshift keep my data secure?

Does Redshift support granular access controls?

Does HAQM Redshift support data masking or data tokenization?

Does HAQM Redshift support single sign-on?

Does HAQM Redshift support multi-factor authentication (MFA)?

Availability and durability

What happens to my data warehouse cluster availability and data durability in the event of individual node failure?

What happens to my data warehouse cluster availability and data durability if my data warehouse cluster's Availability Zone (AZ) has an outage?

Why should I use a Redshift Multi-AZ deployment?

What is RPO and RTO? What RPO and RTO are supported with a Multi-AZ deployment?

How does Redshift Multi-AZ compare to the existing Redshift Relocation feature?

Querying and analytics

Are HAQM Redshift and Redshift Spectrum compatible with my preferred business intelligence software package and ETL tools?

What data formats and compression formats does HAQM Redshift Spectrum support?

What happens if a table in my local storage has the same name as an external table?

I use a Hive Metastore to store metadata about my S3 data lake. Can I use Redshift Spectrum?

How do I get a list of all external database tables created in my cluster?

Does Redshift support the ability to use Machine Learning with SQL?

Does HAQM Redshift provide an API to query data?

What types of credentials can I use with HAQM Redshift Data API?

Can I use HAQM Redshift Data API from AWS CLI?

Is the Redshift Data API integrated with other AWS services?

Do I have to pay separately for using the HAQM Redshift Data API?

Zero-ETL integrations

What is zero-ETL?

What ETL challenges does zero-ETL integration solve?

What are the benefits of zero-ETL?

What zero-ETL integrations are available from AWS today?

What is the pricing model for zero-ETL?

Where can I learn more about zero-ETL and this new feature?