AWS Big Data Blog
Build an end-to-end serverless streaming pipeline with Apache Kafka on HAQM MSK using Python
The volume of data generated globally continues to surge, from gaming, retail, and finance, to manufacturing, healthcare, and travel. Organizations are looking for more ways to quickly use the constant inflow of data to innovate for their businesses and customers. They have to reliably capture, process, analyze, and load the data into a myriad of […]
Unlock insights on HAQM RDS for MySQL data with zero-ETL integration to HAQM Redshift
HAQM Relational Database Service (HAQM RDS) for MySQL zero-ETL integration with HAQM Redshift was announced in preview at AWS re:Invent 2023 for HAQM RDS for MySQL version 8.0.28 or higher. In this post, we provide step-by-step guidance on how to get started with near real-time operational analytics using this feature. This post is a continuation […]
Announcing data filtering for HAQM Aurora MySQL zero-ETL integration with HAQM Redshift
AWS is now announcing data filtering on zero-ETL integrations, enabling you to bring in selective data from the database instance on zero-ETL integrations between HAQM Aurora MySQL and HAQM Redshift. This feature allows you to select individual databases and tables to be replicated to your Redshift data warehouse for analytics use cases. In this post, we provide an overview of use cases where you can use this feature, and provide step-by-step guidance on how to get started with near real time operational analytics using this feature.
Invoke AWS Lambda functions from cross-account HAQM Kinesis Data Streams
A multi-account architecture on AWS is essential for enhancing security, compliance, and resource management by isolating workloads, enabling granular cost allocation, and facilitating collaboration across distinct environments. It also mitigates risks, improves scalability, and allows for advanced networking configurations. In a streaming architecture, you may have event producers, stream storage, and event consumers in a […]
Hybrid Search with HAQM OpenSearch Service
This post explains the internals of hybrid search and how to build a hybrid search solution using OpenSearch Service. We experiment with sample queries to explore and compare lexical, semantic, and hybrid search. All the code used in this post is publicly available in the GitHub repository.
Scale AWS Glue jobs by optimizing IP address consumption and expanding network capacity using a private NAT gateway
As businesses expand, the demand for IP addresses within the corporate network often exceeds the supply. An organization’s network is often designed with some anticipation of future requirements, but as enterprises evolve, their information technology (IT) needs surpass the previously designed network. Companies may find themselves challenged to manage the limited pool of IP addresses. […]
HAQM Managed Service for Apache Flink now supports Apache Flink version 1.18
Apache Flink is an open source distributed processing engine, offering powerful programming interfaces for both stream and batch processing, with first-class support for stateful processing and event time semantics. Apache Flink supports multiple programming languages, Java, Python, Scala, SQL, and multiple APIs with different level of abstraction, which can be used interchangeably in the same […]
Enrich your customer data with geospatial insights using HAQM Redshift, AWS Data Exchange, and HAQM QuickSight
It always pays to know more about your customers, and AWS Data Exchange makes it straightforward to use publicly available census data to enrich your customer dataset. The United States Census Bureau conducts the US census every 10 years and gathers household survey data. This data is anonymized, aggregated, and made available for public use. […]
Multicloud data lake analytics with HAQM Athena
Many organizations operate data lakes spanning multiple cloud data stores. This could be for various reasons, such as business expansions, mergers, or specific cloud provider preferences for different business units. In these cases, you may want an integrated query layer to seamlessly run analytical queries across these diverse cloud stores and streamline your data analytics […]
HAQM OpenSearch H2 2023 in review
2023 was been a busy year for HAQM OpenSearch Service! Learn more about the releases that OpenSearch Service launched in the first half of 2023. In the second half of 2023, OpenSearch Service added the support of two new OpenSearch versions: 2.9 and 2.11 These two versions introduce new features in the search space, machine […]