AWS Big Data Blog
Category: Analytics
Migrate Delta tables from Azure Data Lake Storage to HAQM S3 using AWS Glue
Organizations are increasingly using a multi-cloud strategy to run their production workloads. We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. Customers want to use a variety of AWS analytics, data, AI, and machine learning (ML) […]
Evaluating sample HAQM Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis
In this post, we walk you through the process of testing workload isolation architecture using HAQM Redshift Data Sharing and Test Drive utility. We demonstrate how you can use SQL for advanced price performance analysis and compare different workloads on different target Redshift cluster configurations.
Publish and enrich real-time financial data feeds using HAQM MSK and HAQM Managed Service for Apache Flink
In this post, we demonstrate how you can publish an enriched real-time data feed on AWS using HAQM Managed Streaming for Kafka (HAQM MSK) and HAQM Managed Service for Apache Flink. You can apply this architecture pattern to various use cases within the capital markets industry; we discuss some of those use cases in this post.
HAQM Redshift data ingestion options
HAQM Redshift, a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment. Whether your data resides in operational databases, data lakes, on-premises systems, HAQM Elastic Compute Cloud (HAQM EC2), or other AWS services, HAQM Redshift provides multiple ingestion methods to meet your specific needs. The currently […]
Integrate sparse and dense vectors to enhance knowledge retrieval in RAG using HAQM OpenSearch Service
In this post, instead of using the BM25 algorithm, we introduce sparse vector retrieval. This approach offers improved term expansion while maintaining interpretability. We walk through the steps of integrating sparse and dense vectors for knowledge retrieval using HAQM OpenSearch Service and run some experiments on some public datasets to show its advantages.
Use the AWS CDK with the Data Solutions Framework to provision and manage HAQM Redshift Serverless
In this post, we demonstrate how to use the AWS CDK and DSF to create a multi-data warehouse platform based on HAQM Redshift Serverless. DSF simplifies the provisioning of Redshift Serverless, initialization and cataloging of data, and data sharing between different data warehouse deployments.
Accelerate data integration with Salesforce and AWS using AWS Glue
To meet the demands of diverse data integration use cases, AWS Glue now supports SaaS connectivity for Salesforce. This enables users to quickly preview and transfer their customer relationship management (CRM) data, fetch the schema dynamically on request, and query the data. This post explores the new Salesforce connector for AWS Glue and demonstrates how to build a modern extract, transform, and load (ETL) pipeline with AWS Glue ETL scripts.
Integrate Tableau and Microsoft Entra ID with HAQM Redshift using AWS IAM Identity Center
This blog post provides a step-by-step guide to integrating IAM Identity Center with Microsoft Entra ID as the IdP and configuring HAQM Redshift as an AWS managed application. Additionally, you’ll learn how to set up the HAQM Redshift driver in Tableau, enabling SSO directly within Tableau Desktop.
Introducing job queuing to scale your AWS Glue workloads
Today, we are pleased to announce the general availability of AWS Glue job queuing. Job queuing increases scalability and improves the customer experience of managing AWS Glue jobs. With this new capability, you no longer need to manage concurrency of your AWS Glue job runs and attempt retries just to avoid job failures due to high concurrency. This post demonstrates how job queuing helps you scale your Glue workloads and how job queuing works.
Harness Zero Copy data sharing from Salesforce Data Cloud to HAQM Redshift for Unified Analytics – Part 1
In a previous post, we showed how Zero Copy data federation empowers businesses to access HAQM Redshift data within the Salesforce Data Cloud to enrich customer 360 data with operational data. This two-part series explores how analytics teams can access customer 360 data from Salesforce Data Cloud within HAQM Redshift to generate insights on unified data without the overhead of extract, transform, and load (ETL) pipelines. In this post, we cover data sharing between Salesforce Data Cloud and customers’ AWS accounts in the same AWS Region. Part 2 covers cross-Region data sharing between Salesforce Data Cloud and customers’ AWS accounts.