AWS Big Data Blog
Category: *Post Types
OpenSearch UI: Six months in review
OpenSearch UI has been adopted by thousands of customers for various use cases since its launch in November 2024. Exciting customer stories and feedback have helped shape our feature improvements. As we complete 6 months since its general availability, we are sharing major enhancements that have improved OpenSearch UI’s capability, especially in observability and security analytics, in this post.
How LaunchDarkly migrated to HAQM MWAA to achieve efficiency and scale
In this post, we explore how LaunchDarkly scaled the internal analytics platform up to 14,000 tasks per day, with minimal increase in costs, after migrating from another vendor-managed Apache Airflow solution to AWS, using HAQM Managed Workflows for Apache Airflow (HAQM MWAA) and HAQM Elastic Container Service (HAQM ECS).
Access HAQM Redshift Managed Storage tables through Apache Spark on AWS Glue and HAQM EMR using HAQM SageMaker Lakehouse
With SageMaker Lakehouse, you can access tables stored in HAQM Redshift managed storage (RMS) through Iceberg APIs, using the Iceberg REST catalog backed by AWS Glue Data Catalog. This post describes how to integrate data on RMS tables through Apache Spark using SageMaker Unified Studio, HAQM EMR 7.5.0 and higher, and AWS Glue 5.0.
Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with HAQM EMR Serverless
In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to HAQM EMR Serverless, detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations.
Configure cross-account access of HAQM SageMaker Lakehouse multi-catalog tables using AWS Glue 5.0 Spark
In this post, we show you how to share an HAQM Redshift table and HAQM S3 based Iceberg table from the account that owns the data to another account that consumes the data. In the recipient account, we run a join query on the shared data lake and data warehouse tables using Spark in AWS Glue 5.0. We walk you through the complete cross-account setup and provide the Spark configuration in a Python notebook.
Introducing HAQM Q Developer in HAQM OpenSearch Service
today we introduced HAQM Q Developer support in OpenSearch Service. With this AI-assisted analysis, both new and experienced users can navigate complex operational data without training, analyze issues, and gain insights in a fraction of the time. In this post, we share how to get started using HAQM Q Developer in OpenSearch Service and explore some of its key capabilities.
How Flutter UKI optimizes data pipelines with AWS Managed Workflows for Apache Airflow
In this post, we share how Flutter UKI transitioned from a monolithic HAQM Elastic Compute Cloud (HAQM EC2)-based Airflow setup to a scalable and optimized HAQM Managed Workflows for Apache Airflow (HAQM MWAA) architecture using features like Kubernetes Pod Operator, continuous integration and delivery (CI/CD) integration, and performance optimization techniques.
How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and HAQM Athena
At the BMW Group, our Cloud Efficiency Analytics (CLEA) team has developed a FinOps solution to optimize costs across over 10,000 cloud accounts This post explores our journey, from the initial challenges to our current architecture, and details the steps we took to achieve a highly efficient, serverless data transformation setup.
Access your existing data and resources through HAQM SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and HAQM Redshift
This series of posts demonstrates how you can onboard and access existing AWS data sources using SageMaker Unified Studio. This post focuses on onboarding existing AWS Glue Data Catalog tables and database tables available in HAQM Redshift.
Access your existing data and resources through HAQM SageMaker Unified Studio, Part 2: HAQM S3, HAQM RDS, HAQM DynamoDB, and HAQM EMR
In this post we discuss integrating additional vital data sources such as HAQM Simple Storage Service (HAQM S3) buckets, HAQM Relational Database Service (HAQM RDS), HAQM DynamoDB, and HAQM EMR clusters. We demonstrate how to configure the necessary permissions, establish connections, and effectively use these resources within SageMaker Unified Studio. Whether you’re working with object storage, relational databases, NoSQL databases, or big data processing, this post can help you seamlessly incorporate your existing data infrastructure into your SageMaker Unified Studio workflows.