AWS Big Data Blog

Tag: HAQM Redshift Spectrum

Accelerate HAQM Redshift Data Lake queries with AWS Glue Data Catalog Column Statistics

Over the last year, HAQM Redshift added several performance optimizations for data lake queries across multiple areas of query engine such as rewrite, planning, scan execution and consuming AWS Glue Data Catalog column statistics. In this post, we highlight the performance improvements we observed using industry standard TPC-DS benchmarks. Overall execution time of TPC-DS 3 TB benchmark improved by 3x. Some of the queries in our benchmark experienced up to 12x speed up.

hubandspoke

Seamless integration of data lake and data warehouse using HAQM Redshift Spectrum and HAQM DataZone

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. A data mesh framework empowers business units with data ownership and facilitates seamless sharing. However, integrating datasets from different business units can present several […]

Create, train, and deploy HAQM Redshift ML model integrating features from HAQM SageMaker Feature Store

HAQM Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. Data analysts and database developers want to use this data to train machine learning (ML) models, which can then be used to generate insights on new data for use cases such as forecasting […]

Automate data archival for HAQM Redshift time series tables

HAQM Redshift is a fast, petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all of your data using standard SQL. Tens of thousands of customers today rely on HAQM Redshift to analyze exabytes of data and run complex analytical queries, making it the most widely used cloud data warehouse. You can […]

ScopeofSolution

Accelerate self-service analytics with HAQM Redshift Query Editor V2

August 2023: This post was reviewed and updated with new features. HAQM Redshift is a fast, fully managed cloud data warehouse. Tens of thousands of customers use HAQM Redshift as their analytics platform. Users such as data analysts, database developers, and data scientists use SQL to analyze their data in HAQM Redshift data warehouses. HAQM […]

Speed up data ingestion on HAQM Redshift with BryteFlow

This is a guest post by Pradnya Bhandary, Co-Founder and CEO at Bryte Systems. Data can be transformative for an organization. How and where you store your data for analysis and business intelligence is therefore an especially important decision that each organization needs to make. Should you choose an on-premises data warehouse solution or embrace […]

Manage and control your cost with HAQM Redshift Concurrency Scaling and Spectrum

This post shares the simple steps you can take to use the new HAQM Redshift usage controls feature to monitor and control your usage and associated cost for HAQM Redshift Spectrum and Concurrency Scaling features. Redshift Spectrum enables you to power a lake house architecture to directly query and join data across your data warehouse and data lake, and Concurrency Scaling enables you to support thousands of concurrent users and queries with consistently fast query performance.

Develop an application migration methodology to modernize your data warehouse with HAQM Redshift

This post demonstrates how to develop a comprehensive, wave-based application migration methodology for a complex project to modernize a traditional MPP data warehouse with HAQM Redshift. It provides best practices and lessons learned by considering business priority, data dependency, workload profiles and existing service level agreements (SLAs).

Restrict HAQM Redshift Spectrum external table access to HAQM Redshift IAM users and groups using role chaining

With HAQM Redshift Spectrum, you can query the data in your HAQM Simple Storage Service (HAQM S3) data lake using a central AWS Glue metastore from your HAQM Redshift cluster. This capability extends your petabyte-scale HAQM Redshift data warehouse to unbounded data storage limits, which allows you to scale to exabytes of data cost-effectively. Like HAQM EMR, you get the benefits of open data formats and inexpensive storage, and you can scale out to thousands of Redshift Spectrum nodes to pull data, filter, project, aggregate, group, and sort. Like HAQM Athena, Redshift Spectrum is serverless and there’s nothing to provision or manage. You only pay $5 for every 1 TB of data scanned. This post discusses how to configure HAQM Redshift security to enable fine grained access control using role chaining to achieve high-fidelity user-based permission management.

Working with nested data types using HAQM Redshift Spectrum

Redshift Spectrum is a feature of HAQM Redshift that allows you to query data stored on HAQM S3 directly and supports nested data types. This post discusses which use cases can benefit from nested data types, how to use HAQM Redshift Spectrum with nested data types to achieve excellent performance and storage efficiency, and some […]