AWS Big Data Blog

Tag: HAQM Redshift

Scale your cloud data warehouse and reduce costs with the new HAQM Redshift RA3 nodes with managed storage

One of our favorite things about working on HAQM Redshift, the cloud data warehouse service at AWS, is the inspiring stories from customers about how they’re using data to gain business insights. Many of our recent engagements have been with customers upgrading to the new instance type, HAQM Redshift RA3 with managed storage. In this […]

Optimize Python ETL by extending Pandas with AWS Data Wrangler

April 2024: This post was reviewed for accuracy. Developing extract, transform, and load (ETL) data pipelines is one of the most time-consuming steps to keep data lakes, data warehouses, and databases up to date and ready to provide business insights. You can categorize these pipelines into distributed and non-distributed, and the choice of one or […]

Stream Twitter data into HAQM Redshift using HAQM MSK and AWS Glue streaming ETL

This post demonstrates how customers, system integrator (SI) partners, and developers can use the serverless streaming ETL capabilities of AWS Glue with HAQM Managed Streaming for Kafka (HAQM MSK) to stream data to a data warehouse such as HAQM Redshift. We also show you how to view Twitter streaming data on HAQM QuickSight via HAQM Redshift.

Manage and control your cost with HAQM Redshift Concurrency Scaling and Spectrum

This post shares the simple steps you can take to use the new HAQM Redshift usage controls feature to monitor and control your usage and associated cost for HAQM Redshift Spectrum and Concurrency Scaling features. Redshift Spectrum enables you to power a lake house architecture to directly query and join data across your data warehouse and data lake, and Concurrency Scaling enables you to support thousands of concurrent users and queries with consistently fast query performance.

Federate access to your HAQM Redshift cluster with Active Directory Federation Services (AD FS): Part 2

In the first post of this series, Federating access to your HAQM Redshift cluster with Active Directory: Part 1, you set up Microsoft Active Directory Federation Services (AD FS) and Security Assertion Markup Language (SAML) based authentication and tested the SAML federation using a web browser. In Part 2, you learn to set up an […]

Federate access to your HAQM Redshift cluster with Active Directory Federation Services (AD FS): Part 1

This blog post was reviewed and updated May 2022, to include and comply with recently published Part 3 from this series. Many customers request detailed steps to set up federated single sign-on (SSO) using Microsoft Active Directory Federation Services (AD FS) for HAQM Redshift. In this two-part series, you will find detailed steps to achieve […]

Develop an application migration methodology to modernize your data warehouse with HAQM Redshift

This post demonstrates how to develop a comprehensive, wave-based application migration methodology for a complex project to modernize a traditional MPP data warehouse with HAQM Redshift. It provides best practices and lessons learned by considering business priority, data dependency, workload profiles and existing service level agreements (SLAs).

Restrict HAQM Redshift Spectrum external table access to HAQM Redshift IAM users and groups using role chaining

With HAQM Redshift Spectrum, you can query the data in your HAQM Simple Storage Service (HAQM S3) data lake using a central AWS Glue metastore from your HAQM Redshift cluster. This capability extends your petabyte-scale HAQM Redshift data warehouse to unbounded data storage limits, which allows you to scale to exabytes of data cost-effectively. Like HAQM EMR, you get the benefits of open data formats and inexpensive storage, and you can scale out to thousands of Redshift Spectrum nodes to pull data, filter, project, aggregate, group, and sort. Like HAQM Athena, Redshift Spectrum is serverless and there’s nothing to provision or manage. You only pay $5 for every 1 TB of data scanned. This post discusses how to configure HAQM Redshift security to enable fine grained access control using role chaining to achieve high-fidelity user-based permission management.

How Wind Mobility built a serverless data architecture

We parse through millions of scooter and user events generated daily (over 300 events per second) to extract actionable insight. We selected AWS Glue to perform this task. Our primary ETL job reads the newly added raw event data from HAQM S3, processes it using Apache Spark, and writes the results to our HAQM Redshift data warehouse. AWS Glue plays a critical role in our ability to scale on demand. After careful evaluation and testing, we concluded that AWS Glue ETL jobs meet all our needs and free us from procuring and managing infrastructure.

Extend your HAQM Redshift Data Warehouse to your Data Lake

HAQM Redshift is a fast, fully managed, cloud-native data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence tools. Many companies today are using HAQM Redshift to analyze data and perform various transformations on the data. However, as data continues to grow and become […]