AWS Big Data Blog
Category: Intermediate (200)
Embracing event driven architecture to enhance resilience of data solutions built on HAQM SageMaker
This post provides guidance on how you can use event driven architecture to enhance the resiliency of data solutions built on the next generation of HAQM SageMaker, a unified platform for data, analytics, and AI. SageMaker is a managed service with high availability and durability.
Powering global payout intelligence: How MassPay uses HAQM Redshift Serverless and zero-ETL to drive deeper analytics.
In this blog post we shall cover how understanding real-time payout performance, identifying customer behavior patterns across regions, and optimizing internal operations required more than traditional business intelligence and analytics tools. And how since implementing HAQM Redshift and Zero-ETL, MassPay has seen 90% reduction in data availability latency, payments data available for analytics 1.5x faster, leading to 45% reduction in time-to-insight and 37% fewer support tickets related to transaction visibility and payment inquiries.
Scalable analytics and centralized governance for Apache Iceberg tables using HAQM S3 Tables and HAQM Redshift
In this post, we’ll build on the first post in this series to show you how to set up an Apache Iceberg data lake catalog using HAQM S3 Tables and provide different levels of access control to your data. Through this example, you’ll set up fine-grained access controls for multiple users and see how this works using HAQM Redshift. We’ll also review an example with simultaneously using data that resides both in HAQM Redshift and HAQM S3 Tables, enabling a unified analytics experience.
How LaunchDarkly migrated to HAQM MWAA to achieve efficiency and scale
In this post, we explore how LaunchDarkly scaled the internal analytics platform up to 14,000 tasks per day, with minimal increase in costs, after migrating from another vendor-managed Apache Airflow solution to AWS, using HAQM Managed Workflows for Apache Airflow (HAQM MWAA) and HAQM Elastic Container Service (HAQM ECS).
Zero-copy, Coordination-free approach to OpenSearch Snapshots
In this blog post, we tell you how we enhanced the snapshot efficiency in HAQM OpenSearch Service while carefully maintaining these critical operational aspects. These snapshot optimizations are enabled for all OpenSearch optimized instance family (OR1, OR2, OM2) domains from version 2.17 onwards.
Automate replication of row-level security from AWS Lake Formation to HAQM QuickSight
This post outlines a solution to automatically replicate the entitlements for readers from the source (AWS Lake Formation) to HAQM QuickSight. This solution can be used even when the authentication method in HAQM QuickSight is not using IAM Identity Center and can work with both direct query and SPICE datasets in HAQM QuickSight.
Build end-to-end Apache Spark pipelines with HAQM MWAA, Batch Processing Gateway, and HAQM EMR on EKS clusters
This post shows how to enhance the multi-cluster solution by integrating HAQM Managed Workflows for Apache Airflow (HAQM MWAA) with BPG. By using HAQM MWAA, we add job scheduling and orchestration capabilities, enabling you to build a comprehensive end-to-end Spark-based data processing pipeline.
Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode
In this post, we demonstrate how to use Lake Formation for read access while continuing to use AWS Identity and Access Management (IAM) policy-based permissions for write workloads that update the schema and upsert (insert and update combined) data records into the Iceberg tables.
Integrate ThoughtSpot with HAQM Redshift using AWS IAM Identity Center
In this post, we walk you through the process of setting up ThoughtSpot integration with HAQM Redshift using IAM Identity Center authentication. The solution provides a secure, streamlined analytics environment that empowers your team to focus on what matters most: discovering and sharing valuable business insights.
Correlate telemetry data with HAQM OpenSearch Service and HAQM Managed Grafana
In this post, we show you how to use HAQM OpenSearch Service and HAQM Managed Grafana to correlate the various observability signals that improve root cause analysis, thereby resulting in reduced Mean Time to Resolution (MTTR). We also provide a reference solution that can be used at scale for proactive monitoring of enterprise applications to avoid a problem before they occur.