Why HAQM Redshift Integration for Apache Spark?
HAQM Redshift Integration for Apache Spark simplifies and accelerates Apache Spark applications accessing HAQM Redshift data from AWS analytics services such as HAQM EMR, AWS Glue, and HAQM SageMaker. Using HAQM EMR, AWS Glue, and SageMaker, you can quickly build Apache Spark applications that read from and write to your HAQM Redshift data warehouse, without compromising performance or transactional consistency. HAQM Redshift Integration for Apache Spark also uses AWS Identity and Access Management (IAM)–based credentials to enhance security. With HAQM Redshift Integration for Apache Spark, there is no manual setup and maintenance of uncertified versions of third-party connectors. You can start with Apache Spark jobs using data in HAQM Redshift in seconds. This new integration improves the performance of Apache Spark applications using HAQM Redshift data.
Benefits of HAQM Redshift
How it works

Use cases
Customers

Huron is a global professional services firm that collaborates with clients to put possible into practice by creating sound strategies, optimizing operations, accelerating digital transformation, and empowering businesses and their people to own their future.
"We empower our engineers to build their data pipelines and applications with Apache Spark using Python and Scala. We wanted a tailored solution that simplified operations and delivered faster and more efficiently for our clients and that’s what we get with the new HAQM Redshift Integration for Apache Spark."
Corey Johnson, Data Architect Manager - Huron Consulting

GE Aerospace is a global provider of jet engines, components, and systems for commercial and military aircraft. The company has been designing, developing, and manufacturing jet engines since World War I.
“GE Aerospace uses AWS analytics and HAQM Redshift to enable critical business insights that drive important business decisions. With the support for auto-copy from HAQM S3, we can build simpler data pipelines to move data from HAQM S3 to HAQM Redshift. This accelerates our data product teams’ ability to access data and deliver insights to end users. We spend more time adding value through data and less time on integrations.”
Alcuin Weidus, Sr Principal Data Architect - GE Aerospace

The Goldman Sachs Group, Inc. is a leading global financial institution that delivers a broad range of financial services across investment banking, securities, investment management and consumer banking to a large and diversified client base that includes corporations, financial institutions, governments, and individuals.
"Our focus is on providing self-service access to data for all of our users at Goldman Sachs. Through Legend, our open source data management and governance platform, we enable users to develop data-centric applications and derive data-driven insights as we collaborate across the financial services industry. With HAQM Redshift integration for Apache Spark, our data platform team will be able to access HAQM Redshift data with minimal manual steps—allowing for zero-code ETL that will increase our ability to make it easier for engineers to focus on perfecting their workflow as they collect complete and timely information. We expect to see a performance improvement of applications and improved security as our users can now easily access the latest data in HAQM Redshift.”
Neema Raphael, Chief Data Officer - Goldman Sachs