AWS Big Data Blog
HAQM Redshift – 2017 Recap
We have been busy adding new features and capabilities to HAQM Redshift, and we wanted to give you a glimpse of what we’ve been doing over the past year. In this article, we recap a few of our enhancements and provide a set of resources that you can use to learn more and get the most out of your HAQM Redshift implementation.
In 2017, we made more than 30 announcements about HAQM Redshift. We listened to you, our customers, and delivered Redshift Spectrum, a feature of HAQM Redshift, that gives you the ability to extend analytics to your data lake—without moving data. We launched new DC2 nodes, doubling performance at the same price. We also announced many new features that provide greater scalability, better performance, more automation, and easier ways to manage your analytics workloads.
To see a full list of our launches, visit our what’s new page—and be sure to subscribe to our RSS feed.
Major launches in 2017
HAQM Redshift Spectrum—extend analytics to your data lake, without moving data
We launched HAQM Redshift Spectrum to give you the freedom to store data in HAQM S3, in open file formats, and have it available for analytics without the need to load it into your HAQM Redshift cluster. It enables you to easily join datasets across Redshift clusters and S3 to provide unique insights that you would not be able to obtain by querying independent data silos.
With Redshift Spectrum, you can run SQL queries against data in an HAQM S3 data lake as easily as you analyze data stored in HAQM Redshift. And you can do it without loading data or resizing the HAQM Redshift cluster based on growing data volumes. Redshift Spectrum separates compute and storage to meet workload demands for data size, concurrency, and performance. Redshift Spectrum scales processing across thousands of nodes, so results are fast, even with massive datasets and complex queries. You can query open file formats that you already use—such as Apache Avro, CSV, Grok, ORC, Apache Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV—directly in HAQM S3, without any data movement.
“For complex queries, Redshift Spectrum provided a 67 percent performance gain,” said Rafi Ton, CEO, NUVIAD. “Using the Parquet data format, Redshift Spectrum delivered an 80 percent performance improvement. For us, this was substantial.”
To learn more about Redshift Spectrum, watch our AWS Summit session Intro to HAQM Redshift Spectrum: Now Query Exabytes of Data in S3, and read our announcement blog post HAQM Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data.
DC2 nodes—twice the performance of DC1 at the same price
We launched second-generation Dense Compute (DC2) nodes to provide low latency and high throughput for demanding data warehousing workloads. DC2 nodes feature powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks (SSDs). We’ve tuned HAQM Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.
“Redshift allows us to quickly spin up clusters and provide our data scientists with a fast and easy method to access data and generate insights,” said Bradley Todd, technology architect at Liberty Mutual. “We saw a 9x reduction in month-end reporting time with Redshift DC2 nodes as compared to DC1.”
Read our customer testimonials to see the performance gains our customers are experiencing with DC2 nodes. To learn more, read our blog post HAQM Redshift Dense Compute (DC2) Nodes Deliver Twice the Performance as DC1 at the Same Price.
Performance enhancements— 3x-5x faster queries
On average, our customers are seeing 3x to 5x performance gains for most of their critical workloads.
We introduced short query acceleration to speed up execution of queries such as reports, dashboards, and interactive analysis. Short query acceleration uses machine learning to predict the execution time of a query, and to move short running queries to an express short query queue for faster processing.
We launched results caching to deliver sub-second response times for queries that are repeated, such as dashboards, visualizations, and those from BI tools. Results caching has an added benefit of freeing up resources to improve the performance of all other queries.
We also introduced late materialization to reduce the amount of data scanned for queries with predicate filters by batching and factoring in the filtering of predicates before fetching data blocks in the next column. For example, if only 10 percent of the table rows satisfy the predicate filters, HAQM Redshift can potentially save 90 percent of the I/O for the remaining columns to improve query performance.
We launched query monitoring rules and pre-defined rule templates. These features make it easier for you to set metrics-based performance boundaries for workload management (WLM) queries, and specify what action to take when a query goes beyond those boundaries. For example, for a queue that’s dedicated to short-running queries, you might create a rule that aborts queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule that logs queries that contain nested loops.
Customer insights
HAQM Redshift and Redshift Spectrum serve customers across a variety of industries and sizes, from startups to large enterprises. Visit our customer page to see the success that customers are having with our recent enhancements. Learn how companies like Liberty Mutual Insurance saw a 9x reduction in month-end reporting time using DC2 nodes. On this page, you can find case studies, videos, and other content that show how our customers are using HAQM Redshift to drive innovation and business results.
In addition, check out these resources to learn about the success our customers are having building out a data warehouse and data lake integration solution with HAQM Redshift:
- Sysco: Developing an Insights Platform – Sysco’s Journey from Disparate Systems to a Data Lake and Beyond (re:Invent session recording)
- 21st Century Fox: Migrating Your Traditional Data Warehouse to a Modern Data Lake (re:Invent session recording)
- Cerberus Technologies: How I built a data warehouse using HAQM Redshift and AWS services in record time (blog post)
- NUVIAD: Using HAQM Redshift Spectrum, HAQM Athena, and AWS Glue with Node.js in Production (blog post)
- Periscope Data: Making Every Redshift Query Valuable with Periscope Data (This is My Architecture episode)
- Lyft Case Study
- Boingo Wireless Case Study
Partner solutions
You can enhance your HAQM Redshift data warehouse by working with industry-leading experts. Our AWS Partner Network (APN) Partners have certified their solutions to work with HAQM Redshift. They offer software, tools, integration, and consulting services to help you at every step. Visit our HAQM Redshift Partner page and choose an APN Partner. Or, use AWS Marketplace to find and immediately start using third-party software.
To see what our Partners are saying about HAQM Redshift Spectrum and our DC2 nodes mentioned earlier, read these blog posts:
- Looker: Using HAQM Redshift’s new Spectrum Feature
- Matillion: Accessing your Data Lake Assets from HAQM Redshift Spectrum
- Periscope Data: HAQM Redshift’s Hardware Upgrade Improves Query Speed by up to 5x
- Reflect: The Implications of Redshift Spectrum
- SnapLogic: Integrate through the big data insights gap
- Tableau: Tableau 10.4 Supports HAQM Redshift Spectrum with External HAQM S3 Tables
Resources
Blog posts
Visit the AWS Big Data Blog for a list of all HAQM Redshift articles.
- HAQM Redshift Spectrum Extends Data Warehousing Out to Exabytes—No Loading Required
- 10 Best Practices for HAQM Redshift Spectrum
- Top 8 Best Practices for High-Performance ETL Processing Using HAQM Redshift
- Analyze Database Audit Logs for Security and Compliance Using HAQM Redshift Spectrum
- From Data Lake to Data Warehouse: Enhancing Customer 360 with HAQM Redshift Spectrum
YouTube videos
- re:Invent session recording: Best Practices for Data Warehousing with HAQM Redshift
- AWS Online Tech Talk: Analyze your Data Lake, Fast @ Any Scale
- AWS Online Tech Talk: HAQM Redshift Spectrum: Quickly Query Exabytes of Data in S3
GitHub
Our community of experts contribute on GitHub to provide tips and hints that can help you get the most out of your deployment. Visit GitHub frequently to get the latest technical guidance, code samples, administrative task automation utilities, the analyze & vacuum schema utility, and more.
Customer support
If you are evaluating or considering a proof of concept with HAQM Redshift, or you need assistance migrating your on-premises or other cloud-based data warehouse to HAQM Redshift, our team of product experts and solutions architects can help you with architecting, sizing, and optimizing your data warehouse. Contact us using this support request form, and let us know how we can assist you.
If you are an HAQM Redshift customer, we offer a no-cost health check program. Our team of database engineers and solutions architects give you recommendations for optimizing HAQM Redshift and HAQM Redshift Spectrum for your specific workloads. To learn more, email us at redshift-feedback@haqm.com.
If you have any questions, email us at redshift-feedback@haqm.com.
Additional Reading
If you found this post useful, be sure to check out HAQM Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data, Using HAQM Redshift for Fast Analytical Reports and How to Migrate Your Oracle Data Warehouse to HAQM Redshift Using AWS SCT and AWS DMS.
About the Author
Larry Heathcote is a Principal Product Marketing Manager at HAQM Web Services for data warehousing and analytics. Larry is passionate about seeing the results of data-driven insights on business outcomes. He enjoys family time, home projects, grilling out and the taste of classic barbeque.