AWS Big Data Blog

Category: Advanced (300)

Governing data in relational databases using HAQM DataZone

Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. HAQM DataZone is a fully managed data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across HAQM Web Services (AWS), on premises, and on third-party […]

Dive deep into security management: The Data on EKS Platform

The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS, an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on HAQM Elastic Kubernetes Service (HAQM EKS). In the realm of big data, securing […]

Use your corporate identities for analytics with HAQM EMR and AWS IAM Identity Center

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity and Access Management (IAM) roles with different data permissions and map the workforce users to one of those roles. Multiple users are often mapped to the same role where they need […]

Dynamic DAG generation with YAML and DAG Factory in HAQM MWAA

HAQM Managed Workflow for Apache Airflow (HAQM MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure. In Airflow, Directed Acyclic Graphs (DAGs) are defined as Python code. […]

HAQM OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

HAQM OpenSearch Service recently introduced the OpenSearch Optimized Instance family (OR1), which delivers up to 30% price-performance improvement over existing memory optimized instances in internal benchmarks, and uses HAQM Simple Storage Service (HAQM S3) to provide 11 9s of durability. With this new instance family, OpenSearch Service uses OpenSearch innovation and AWS technologies to reimagine […]

bdb-3883-image001

Achieve near real time operational analytics using HAQM Aurora PostgreSQL zero-ETL integration with HAQM Redshift

Our zero-ETL integration with HAQM Redshift facilitates point-to-point data movement to get it ready for analytics, artificial intelligence (AI) and machine learning (ML) using HAQM Redshift on petabytes of data. In this post, we provide step-by-step guidance on how to get started with near real time operational analytics using the HAQM Aurora PostgreSQL zero-ETL integration with HAQM Redshift.

HAQM DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Today, we are pleased to announce that HAQM DataZone is now able to present data quality information for data assets. This information empowers end-users to make informed decisions as to whether or not to use specific assets. In this post, we discuss the latest features of HAQM DataZone for data quality, the integration between HAQM DataZone and AWS Glue Data Quality and how you can import data quality scores produced by external systems into HAQM DataZone via API.

Introducing enhanced functionality for worker configuration management in HAQM MSK Connect

HAQM MSK Connect is a fully managed service for Apache Kafka Connect. With a few clicks, MSK Connect allows you to deploy connectors that move data between Apache Kafka and external systems. MSK Connect now supports the ability to delete MSK Connect worker configurations, tag resources, and manage worker configurations and custom plugins using AWS […]

Build an end-to-end serverless streaming pipeline with Apache Kafka on HAQM MSK using Python

The volume of data generated globally continues to surge, from gaming, retail, and finance, to manufacturing, healthcare, and travel. Organizations are looking for more ways to quickly use the constant inflow of data to innovate for their businesses and customers. They have to reliably capture, process, analyze, and load the data into a myriad of […]

HAQM KDS-Lambda cross acct solution architecture

Invoke AWS Lambda functions from cross-account HAQM Kinesis Data Streams

A multi-account architecture on AWS is essential for enhancing security, compliance, and resource management by isolating workloads, enabling granular cost allocation, and facilitating collaboration across distinct environments. It also mitigates risks, improves scalability, and allows for advanced networking configurations. In a streaming architecture, you may have event producers, stream storage, and event consumers in a […]