AWS Big Data Blog
Tag: AWS Lambda
Modernize your legacy databases with AWS data lakes, Part 2: Build a data lake using AWS DMS data on Apache Iceberg
This is part two of a three-part series where we show how to build a data lake on AWS using a modern data architecture. This post shows how to load data from a legacy database (SQL Server) into a transactional data lake (Apache Iceberg) using AWS Glue. We show how to build data pipelines using AWS Glue jobs, optimize them for both cost and performance, and implement schema evolution to automate manual tasks. To review the first part of the series, where we load SQL Server data into HAQM Simple Storage Service (HAQM S3) using AWS Database Migration Service (AWS DMS), see Modernize your legacy databases with AWS data lakes, Part 1: Migrate SQL Server using AWS DMS.
Level up your React app with HAQM QuickSight: How to embed your dashboard for anonymous access
Using embedded analytics from HAQM QuickSight can simplify the process of equipping your application with functional visualizations without any complex development. There are multiple ways to embed QuickSight dashboards into application. In this post, we look at how it can be done using React and the HAQM QuickSight Embedding SDK. Dashboard consumers often don’t have […]
Optimize Federated Query Performance using EXPLAIN and EXPLAIN ANALYZE in HAQM Athena
HAQM Athena is an interactive query service that makes it easy to analyze data in HAQM Simple Storage Service (HAQM S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In 2019, Athena added support for federated queries to run SQL […]
Synchronize and control your HAQM Redshift clusters maintenance windows
HAQM Redshift is a data warehouse that can expand to exabyte-scale. Today, tens of thousands of AWS customers (including NTT DOCOMO, Finra, and Johnson & Johnson) use HAQM Redshift to run mission-critical business intelligence dashboards, analyze real-time streaming data, and run predictive analytics jobs. HAQM Redshift powers analytical workloads for Fortune 500 companies, startups, and […]
Configure and optimize performance of HAQM Athena federation with HAQM Redshift
This post provides guidance on how to configure HAQM Athena federation with AWS Lambda and HAQM Redshift, while addressing performance considerations to ensure proper use.
Stream, transform, and analyze XML data in real time with HAQM Kinesis, AWS Lambda, and HAQM Redshift
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. When we look at […]
How Wind Mobility built a serverless data architecture
We parse through millions of scooter and user events generated daily (over 300 events per second) to extract actionable insight. We selected AWS Glue to perform this task. Our primary ETL job reads the newly added raw event data from HAQM S3, processes it using Apache Spark, and writes the results to our HAQM Redshift data warehouse. AWS Glue plays a critical role in our ability to scale on demand. After careful evaluation and testing, we concluded that AWS Glue ETL jobs meet all our needs and free us from procuring and managing infrastructure.
Ingest Excel data automatically into HAQM QuickSight
HAQM QuickSight is a fast, cloud-powered, business intelligence (BI) service that makes it easy to deliver insights to everyone in your organization. This post demonstrates how to build a serverless data ingestion pipeline to automatically import frequently changed data into a SPICE (Super-fast, Parallel, In-memory Calculation Engine) dataset of HAQM QuickSight dashboards. It is sometimes […]
How Siemens built a fully managed scheduling mechanism for updates on HAQM S3 data lakes
Siemens is a global technology leader with more than 370,000 employees and 170 years of experience. To protect Siemens from cybercrime, the Siemens Cyber Defense Center (CDC) continuously monitors Siemens’ networks and assets. To handle the resulting enormous data load, the CDC built a next-generation threat detection and analysis platform called ARGOS. ARGOS is a […]
Collect and distribute high-resolution crypto market data with ECS, S3, Athena, Lambda, and AWS Data Exchange
This is a guest post by Floating Point Group. In their own words, “Floating Point Group is on a mission to bring institutional-grade trading services to the world of cryptocurrency.” The need and demand for financial infrastructure designed specifically for trading digital assets may not be obvious. There’s a rather pervasive narrative that these coins […]