AWS Big Data Blog
Category: Database
How Kaplan, Inc. implemented modern data pipelines using HAQM MWAA and HAQM AppFlow with HAQM Redshift as a data warehouse
Kaplan, Inc. provides individuals, educational institutions, and businesses with a broad array of services, supporting our students and partners to meet their diverse and evolving needs throughout their educational and professional journeys. In this post, we discuss how the Kaplan data engineering team implemented data integration from the Salesforce application to HAQM Redshift. The solution uses HAQM Simple Storage Service as a data lake, HAQM Redshift as a data warehouse, HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as an orchestrator, and Tableau as the presentation layer.
Stream data to HAQM S3 for real-time analytics using the Oracle GoldenGate S3 handler
Modern business applications rely on timely and accurate data with increasing demand for real-time analytics. There is a growing need for efficient and scalable data storage solutions. Data at times is stored in different datasets and needs to be consolidated before meaningful and complete insights can be drawn from the datasets. This is where replication […]
Automate data loading from your database into HAQM Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API
HAQM Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. Tens of thousands of customers use HAQM Redshift to process exabytes of data per […]
HAQM DocumentDB zero-ETL integration with HAQM OpenSearch Service is now available
Today, we are announcing the general availability of HAQM DocumentDB (with MongoDB compatibility) zero-ETL integration with HAQM OpenSearch Service. HAQM DocumentDB provides native text search and vector search capabilities. With HAQM OpenSearch Service, you can perform advanced search analytics, such as fuzzy search, synonym search, cross-collection search, and multilingual search, on HAQM DocumentDB data. Zero-ETL […]
Achieve near real time operational analytics using HAQM Aurora PostgreSQL zero-ETL integration with HAQM Redshift
Our zero-ETL integration with HAQM Redshift facilitates point-to-point data movement to get it ready for analytics, artificial intelligence (AI) and machine learning (ML) using HAQM Redshift on petabytes of data. In this post, we provide step-by-step guidance on how to get started with near real time operational analytics using the HAQM Aurora PostgreSQL zero-ETL integration with HAQM Redshift.
Unlock insights on HAQM RDS for MySQL data with zero-ETL integration to HAQM Redshift
HAQM Relational Database Service (HAQM RDS) for MySQL zero-ETL integration with HAQM Redshift was announced in preview at AWS re:Invent 2023 for HAQM RDS for MySQL version 8.0.28 or higher. In this post, we provide step-by-step guidance on how to get started with near real-time operational analytics using this feature. This post is a continuation […]
Announcing data filtering for HAQM Aurora MySQL zero-ETL integration with HAQM Redshift
AWS is now announcing data filtering on zero-ETL integrations, enabling you to bring in selective data from the database instance on zero-ETL integrations between HAQM Aurora MySQL and HAQM Redshift. This feature allows you to select individual databases and tables to be replicated to your Redshift data warehouse for analytics use cases. In this post, we provide an overview of use cases where you can use this feature, and provide step-by-step guidance on how to get started with near real time operational analytics using this feature.
Build a RAG data ingestion pipeline for large-scale ML workloads
For building any generative AI application, enriching the large language models (LLMs) with new data is imperative. This is where the Retrieval Augmented Generation (RAG) technique comes in. RAG is a machine learning (ML) architecture that uses external documents (like Wikipedia) to augment its knowledge and achieve state-of-the-art results on knowledge-intensive tasks. For ingesting these […]
Enable advanced search capabilities for HAQM Keyspaces data by integrating with HAQM OpenSearch Service
In this post, we explore the process of integrating HAQM Keyspaces and HAQM OpenSearch Service using AWS Lambda and HAQM OpenSearch Ingestion to enable advanced search capabilities. The content includes a reference architecture, a step-by-step guide on infrastructure setup, sample code for implementing the solution within a use case, and an AWS Cloud Development Kit (AWS CDK) application for deployment.
Simplify data streaming ingestion for analytics using HAQM MSK and HAQM Redshift
Towards the end of 2022, AWS announced the general availability of real-time streaming ingestion to HAQM Redshift for HAQM Kinesis Data Streams and HAQM Managed Streaming for Apache Kafka (HAQM MSK), eliminating the need to stage streaming data in HAQM Simple Storage Service (HAQM S3) before ingesting it into HAQM Redshift. Streaming ingestion from HAQM […]