AWS Big Data Blog
Category: Python
Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg
This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake (Apache Iceberg) using AWS Glue. We show how to build data pipelines using AWS Glue jobs, optimize them for both cost and performance, and implement schema evolution to automate manual tasks. To review the first part of the series, where we load SQL Server data into HAQM Simple Storage Service (HAQM S3) using AWS Database Migration Service (AWS DMS), see Modernize your legacy databases with AWS data lakes, Part 1: Migrate SQL Server using AWS DMS.
Enrich VPC Flow Logs with resource tags and deliver data to HAQM S3 using HAQM Kinesis Data Firehose
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. VPC Flow Logs is an AWS feature that captures information about the network traffic flows going to and from network interfaces in HAQM Virtual Private Cloud (HAQM VPC). Visibility to the network […]
Create a serverless event-driven workflow to ingest and process Microsoft data with AWS Glue and HAQM EventBridge
Microsoft SharePoint is a document management system for storing files, organizing documents, and sharing and editing documents in collaboration with others. Your organization may want to ingest SharePoint data into your data lake, combine the SharePoint data with other data that’s available in the data lake, and use it for reporting and analytics purposes. AWS […]