AWS Big Data Blog

Use your corporate identities for analytics with HAQM EMR and AWS IAM Identity Center

To enable your workforce users for analytics with fine-grained data access controls and audit data access, you might have to create multiple AWS Identity and Access Management (IAM) roles with different data permissions and map the workforce users to one of those roles. Multiple users are often mapped to the same role where they need […]

Orchestrate an end-to-end ETL pipeline using HAQM S3, AWS Glue, and HAQM Redshift Serverless with HAQM MWAA

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Apache Airflow is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks, referred to as workflows. […]

Run interactive workloads on HAQM EMR Serverless from HAQM EMR Studio

Starting from release 6.14, HAQM EMR Studio supports interactive analytics on HAQM EMR Serverless. You can now use EMR Serverless applications as the compute, in addition to HAQM EMR on EC2 clusters and HAQM EMR on EKS virtual clusters, to run JupyterLab notebooks from EMR Studio Workspaces. EMR Studio is an integrated development environment (IDE) […]

Dynamic DAG generation with YAML and DAG Factory in HAQM MWAA

HAQM Managed Workflow for Apache Airflow (HAQM MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure. In Airflow, Directed Acyclic Graphs (DAGs) are defined as Python code. […]

HAQM OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

HAQM OpenSearch Service recently introduced the OpenSearch Optimized Instance family (OR1), which delivers up to 30% price-performance improvement over existing memory optimized instances in internal benchmarks, and uses HAQM Simple Storage Service (HAQM S3) to provide 11 9s of durability. With this new instance family, OpenSearch Service uses OpenSearch innovation and AWS technologies to reimagine […]

Power analytics as a service capabilities using HAQM Redshift

Analytics as a service (AaaS) is a business model that uses the cloud to deliver analytic capabilities on a subscription basis. This model provides organizations with a cost-effective, scalable, and flexible solution for building analytics. The AaaS model accelerates data-driven decision-making through advanced analytics, enabling organizations to swiftly adapt to changing market trends and make […]

Introducing HAQM MWAA larger environment sizes

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service for Apache Airflow that streamlines the setup and operation of the infrastructure to orchestrate data pipelines in the cloud. Customers use HAQM MWAA to manage the scalability, availability, and security of their Apache Airflow environments. As they design more intensive, complex, and ever-growing […]

Uplevel your data architecture with real- time streaming using HAQM Data Firehose and Snowflake

Today’s fast-paced world demands timely insights and decisions, which is driving the importance of streaming data. Streaming data refers to data that is continuously generated from a variety of sources. The sources of this data, such as clickstream events, change data capture (CDC), application and service logs, and Internet of Things (IoT) data streams are […]

bdb-3883-image001

Achieve near real time operational analytics using HAQM Aurora PostgreSQL zero-ETL integration with HAQM Redshift

Our zero-ETL integration with HAQM Redshift facilitates point-to-point data movement to get it ready for analytics, artificial intelligence (AI) and machine learning (ML) using HAQM Redshift on petabytes of data. In this post, we provide step-by-step guidance on how to get started with near real time operational analytics using the HAQM Aurora PostgreSQL zero-ETL integration with HAQM Redshift.