AWS News Blog

New for HAQM Redshift – Simplify Data Ingestion and Make Your Data Warehouse More Secure and Reliable

Voiced by Polly

When we talk with customers, we hear that they want to be able to harness insights from data in order to make timely, impactful, and actionable business decisions. A common pattern with data-driven organizations is that they have many different data sources they need to ingest into their analytics systems. This requires them to build manual data pipelines spanning across their operational databases, data lakes, streaming data, and data within their warehouse. As a consequence of this complex setup, it can take data engineers weeks or even months to build data ingestion pipelines. These data pipelines are costly, and the delays can lead to missed business opportunities. Additionally, data warehouses are increasingly becoming mission critical systems that require high availability, reliability, and security.

HAQM Redshift is a fully managed petabyte-scale data warehouse used by tens of thousands of customers to easily, quickly, securely, and cost-effectively analyze all their data at any scale. This year at re:Invent, HAQM Redshift has announced a number of features to help you simplify data ingestion and get to insights easily and quickly, within a secure, reliable environment.

In this blog, I introduce some of these new features that fit into two main categories:

  • Simplify data ingestion
    • HAQM Redshift now supports auto-copy from HAQM S3 (available in preview). With this new capability, HAQM Redshift automatically loads the files that arrive in an HAQM Simple Storage Service (HAQM S3) location that you specify into your data warehouse. The files can use any of the formats supported by the HAQM Redshift copy command, such as CSV, JSON, Parquet, and Avro. In this way, you don’t need to manually or repeatedly run copy procedures. HAQM Redshift automates file ingestion and takes care of data-loading steps under the hood.
    • With HAQM Aurora zero-ETL integration with HAQM Redshift, you can use HAQM Redshift for near real-time analytics and machine learning on petabytes of transactional data stored on HAQM Aurora MySQL databases (available in limited preview). With this capability, you can choose the HAQM Aurora databases containing the data you want to analyze with HAQM Redshift. Data is then replicated into your data warehouse within seconds after transactional data is written into HAQM Aurora, eliminating the need to build and maintain complex data pipelines. You can replicate data from multiple HAQM Aurora databases into the same HAQM Redshift instance to run analytics across multiple applications. With near real-time access to transactional data, you can leverage HAQM Redshift’s analytics and capabilities, such as built-in machine learning (ML), materialized views, data sharing, and federated access to multiple data stores and data lakes, to derive insights from transactional and other data.
    • With the general availability of HAQM Redshift Streaming Ingestion, you can now natively ingest hundreds of megabytes of data per second from HAQM Kinesis Data Streams and HAQM MSK into an HAQM Redshift materialized view and query it in seconds. Learn more in this post.
  • Make your data warehouse more secure and reliable
    • You can now improve the availability of your data warehouse by choosing multiple Availability Zone (AZ) deployments. Multi-AZ deployments for your HAQM Redshift clusters are available in preview and reduce recovery times to seconds through automatic recovery. In this way, you can build solutions that are more compliant with the recommendations of the Reliability Pillar of the AWS Well-Architected Framework.
    • With dynamic data masking (available in preview), you can protect sensitive information stored in your data warehouse and ensure that only the relevant data is accessible by users based on their roles. You can limit how much identifiable data is visible to users using multiple levels of policies so different users and groups can have different levels of data access without having to create multiple copies of data. Dynamic data masking complements other granular access control capabilities in HAQM Redshift including row-level and column-level security and role-based access controls. In this way, Dynamic Data Masking helps you meet requirements for GDPR, CCPA, and other privacy regulations.
    • HAQM Redshift now supports central access controls for data sharing with AWS Lake Formation (available in public preview). You can now use Lake Formation to simplify governance of data shared from HAQM Redshift and centrally manage granular access across all data-sharing consumers.

There have been other interesting news for HAQM Redshift at re:Invent you might have already heard about:

  • The general availability of HAQM Redshift integration for Apache Spark makes it easy to build and run Spark applications on HAQM Redshift and Redshift Serverless, opening up the data warehouse for a broader set of AWS analytics and machine learning solutions.
  • AWS Backup now supports HAQM Redshift. AWS Backup allows you to define a central backup policy to manage data protection of your applications and can also protect your HAQM Redshift clusters. In this way, you have a consistent experience when managing data protection across all supported services.

Availability and Pricing
Multi-AZ deployments, central access control for data sharing with AWS Lake Formation, auto-copy from HAQM S3, and dynamic data masking are available in preview in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), Europe (Ireland), and Europe (Stockholm).

There is no additional cost for using auto-copy from HAQM S3 and near real-time analytics on transactional data. There is no extra charge for dynamic data masking and central access control for data sharing. For more information, see HAQM Redshift pricing.

These new capabilities take you one step further in analyzing all your data across data sources with simple data ingestion capabilities, while improving the security and reliability of your data warehouse.

Danilo

Danilo Poccia

Danilo Poccia

Danilo works with startups and companies of any size to support their innovation. In his role as Chief Evangelist (EMEA) at HAQM Web Services, he leverages his experience to help people bring their ideas to life, focusing on serverless architectures and event-driven programming, and on the technical and business impact of machine learning and edge computing. He is the author of AWS Lambda in Action from Manning.