Advanced (300) | AWS Big Data Blog

How BMW streamlined data access using AWS Lake Formation fine-grained access control

This post explores how BMW implemented AWS Lake Formation’s fine-grained access control (FGAC) in the Cloud Data Hub and how this saves them up to 25% on compute and storage costs. By using AWS Lake Formation fine-grained access control capabilities, BMW has transparently implemented finer data access management within the Cloud Data Hub. The integration of Lake Formation has enabled data stewards to scope and grant granular access to specific subsets of data, reducing costly data duplication.

How to implement access control and auditing on HAQM Redshift using Immuta

This post is co-written with Matt Vogt from Immuta. Organizations are looking for products that let them spend less time managing data and more time on core business functions. Data security is one of the key functions in managing a data warehouse. With Immuta integration with HAQM Redshift, user and data security operations are managed […]

Simplify your query performance diagnostics in HAQM Redshift with Query profiler

HAQM Redshift has introduced a new feature called the Query profiler. The Query profiler is a graphical tool that helps users analyze the components and performance of a query. This feature is part of the HAQM Redshift console and provides a visual and graphical representation of the query’s run order, execution plan, and various statistics. The Query profiler makes it easier for users to understand and troubleshoot their queries. In this post, we cover two common use cases for troubleshooting query performance. We show you step-by-step how to analyze and troubleshoot long-running queries using the Query profiler.

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in HAQM OpenSearch Service

This post provides a detailed walkthrough about how to efficiently capture and manage manual snapshots in OpenSearch Service. It covers the essential steps for taking snapshots of your data, implementing safe transfer across different AWS Regions and accounts, and restoring them in a new domain. This guide is designed to help you maintain data integrity and continuity while navigating complex multi-Region and multi-account environments in OpenSearch Service.

Access private code repositories for installing Python dependencies on HAQM MWAA

This post demonstrates a method to selectively install Python dependencies based on the HAQM MWAA component type (web server scheduler, or worker) from a Git repository only accessible from your virtual private cloud (VPC).

How to track HAQM OpenSearch Service domain-level cost

HAQM OpenSearch Service Pricing is based on three dimensions: instances, storage, and data transfer. Storage pricing depends on the chosen storage type and also the storage tier. Visibility into domain-level charges enables accurate budgeting, efficient resource allocation, fair cost attribution across projects, and overall cost transparency. In this post, we show you how to view the OpenSearch Service domain-level cost using AWS Cost Explorer.

Migrate Delta tables from Azure Data Lake Storage to HAQM S3 using AWS Glue

Organizations are increasingly using a multi-cloud strategy to run their production workloads. We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. Customers want to use a variety of AWS analytics, data, AI, and machine learning (ML) […]

Evaluating sample HAQM Redshift data sharing architecture using Redshift Test Drive and advanced SQL analysis

In this post, we walk you through the process of testing workload isolation architecture using HAQM Redshift Data Sharing and Test Drive utility. We demonstrate how you can use SQL for advanced price performance analysis and compare different workloads on different target Redshift cluster configurations.

Accelerate data integration with Salesforce and AWS using AWS Glue

To meet the demands of diverse data integration use cases, AWS Glue now supports SaaS connectivity for Salesforce. This enables users to quickly preview and transfer their customer relationship management (CRM) data, fetch the schema dynamically on request, and query the data. This post explores the new Salesforce connector for AWS Glue and demonstrates how to build a modern extract, transform, and load (ETL) pipeline with AWS Glue ETL scripts.

How Kaplan, Inc. implemented modern data pipelines using HAQM MWAA and HAQM AppFlow with HAQM Redshift as a data warehouse

Kaplan, Inc. provides individuals, educational institutions, and businesses with a broad array of services, supporting our students and partners to meet their diverse and evolving needs throughout their educational and professional journeys. In this post, we discuss how the Kaplan data engineering team implemented data integration from the Salesforce application to HAQM Redshift. The solution uses HAQM Simple Storage Service as a data lake, HAQM Redshift as a data warehouse, HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as an orchestrator, and Tableau as the presentation layer.

Category: Advanced (300)