AWS Big Data Blog
Category: Enterprise Strategy
Accelerate your migration to HAQM OpenSearch Service with Reindexing-from-Snapshot
In this post, we introduce a new mechanism called Reindexing-from-Snapshot (RFS), and explain how it can address your concerns and simplify migrating to OpenSearch.
Run high-availability long-running clusters with HAQM EMR instance fleets
In this post, we demonstrate how to launch a high availability instance fleet cluster using the newly redesigned HAQM EMR console, as well as using an AWS CloudFormation template. We also go over the basic concepts of Hadoop high availability, EMR instance fleets, the benefits and trade-offs of high availability, and best practices for running resilient EMR clusters.
How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using HAQM DataZone
This second post of a two-part series that details how Volkswagen Autoeuropa, a Volkswagen Group plant, together with AWS, built a data solution with a robust governance framework using HAQM DataZone to become a data-driven factory. Part 1 of this series focused on the customer challenges, overall solution architecture and solution features, and how they helped Volkswagen Autoeuropa overcome their challenges. This post dives into the technical details, highlighting the robust data governance framework that enables ease of access to quality data using HAQM DataZone.
How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using HAQM DataZone
In this post, we discuss how Volkswagen Autoeuropa used HAQM DataZone to build a data marketplace based on data mesh architecture to accelerate their digital transformation. The data mesh, built on HAQM DataZone, simplified data access, improved data quality, and established governance at scale to power analytics, reporting, AI, and machine learning (ML) use cases. As a result, the data solution offers benefits such as faster access to data, expeditious decision making, accelerated time to value for use cases, and enhanced data governance.
Migrate Microsoft Azure Synapse Analytics to HAQM Redshift using AWS SCT
In this post, we show how to migrate a data warehouse from Microsoft Azure Synapse to Redshift Serverless using AWS Schema Conversion Tool (AWS SCT) and AWS SCT data extraction agents. AWS SCT makes heterogeneous database migrations predictable by automatically converting the source database code and storage objects to a format compatible with the target database.
Migrate Google BigQuery to HAQM Redshift using AWS Schema Conversion tool (SCT)
HAQM Redshift is a fast, fully-managed, petabyte scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. Using HAQM Redshift Serverless and Query Editor v2, you can load and query large datasets in just a few clicks and pay only for what you use. The decoupled compute and […]
Migrate from Snowflake to HAQM Redshift using AWS Glue Python shell
HAQM Redshift is a fast, petabyte-scale cloud data warehouse delivering the best price-performance. Tens of thousands of customers use HAQM Redshift to analyze exabytes of data per day and power analytics workloads such as BI, predictive analytics, and real-time streaming analytics without having to manage the data warehouse infrastructure. It natively integrates with other AWS […]
Copy large datasets from Google Cloud Storage to HAQM S3 using HAQM EMR
Data migration between GCS and HAQM S3 is possible by utilizing Hadoop’s native support for S3 object storage and using a Google-provided Hadoop connector for GCS. This post demonstrates how to configure an EMR cluster for DistCp and S3DistCP, goes over the settings and parameters for both tools, performs a copy of a test 9.4 TB dataset, and compares the performance of the copy.
Unified serverless streaming ETL architecture with HAQM Kinesis Data Analytics
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Businesses across the world […]
How Goldman Sachs builds cross-account connectivity to their HAQM MSK clusters with AWS PrivateLink
August 2023: HAQM MSK now offers a managed feature called multi-VPC private connectivity to simplify connectivity of your Kafka clients to your brokers. Refer this blog to learn more. This guest post presents patterns for accessing an HAQM Managed Streaming for Apache Kafka cluster across your AWS account or HAQM Virtual Private Cloud (HAQM VPC) […]