AWS Big Data Blog
Category: Compute
Event-driven refresh of SPICE datasets in HAQM QuickSight
Businesses are increasingly harnessing data to improve their business outcomes. To enable this transformation to a data-driven business, customers are bringing together data from structured and unstructured sources into a data lake. Then they use business intelligence (BI) tools, such as HAQM QuickSight, to unlock insights from this data. To provide fast access to datasets, […]
Unified serverless streaming ETL architecture with HAQM Kinesis Data Analytics
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Businesses across the world […]
Automating bucketing of streaming data using HAQM Athena and AWS Lambda
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. In today’s world, data plays a vital role in helping businesses understand and improve their processes and services to reduce cost. You can use several tools to […]
How to delete user data in an AWS data lake
General Data Protection Regulation (GDPR) is an important aspect of today’s technology world, and processing data in compliance with GDPR is a necessity for those who implement solutions within the AWS public cloud. One article of GDPR is the “right to erasure” or “right to be forgotten” which may require you to implement a solution […]
Configure and optimize performance of HAQM Athena federation with HAQM Redshift
This post provides guidance on how to configure HAQM Athena federation with AWS Lambda and HAQM Redshift, while addressing performance considerations to ensure proper use.
Stream, transform, and analyze XML data in real time with HAQM Kinesis, AWS Lambda, and HAQM Redshift
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. When we look at […]
How Wind Mobility built a serverless data architecture
We parse through millions of scooter and user events generated daily (over 300 events per second) to extract actionable insight. We selected AWS Glue to perform this task. Our primary ETL job reads the newly added raw event data from HAQM S3, processes it using Apache Spark, and writes the results to our HAQM Redshift data warehouse. AWS Glue plays a critical role in our ability to scale on demand. After careful evaluation and testing, we concluded that AWS Glue ETL jobs meet all our needs and free us from procuring and managing infrastructure.
Running a high-performance SAS Grid Manager cluster on AWS with HAQM FSx for Lustre
SAS® is a software provider of data science and analytics used by enterprises and government organizations. SAS Grid is a highly available, fast processing analytics platform that offers centralized management that balances workloads across different compute nodes. This application suite is capable of data management, visual analytics, governance and security, forecasting and text mining, statistical […]
Ingest Excel data automatically into HAQM QuickSight
HAQM QuickSight is a fast, cloud-powered, business intelligence (BI) service that makes it easy to deliver insights to everyone in your organization. This post demonstrates how to build a serverless data ingestion pipeline to automatically import frequently changed data into a SPICE (Super-fast, Parallel, In-memory Calculation Engine) dataset of HAQM QuickSight dashboards. It is sometimes […]
How Siemens built a fully managed scheduling mechanism for updates on HAQM S3 data lakes
Siemens is a global technology leader with more than 370,000 employees and 170 years of experience. To protect Siemens from cybercrime, the Siemens Cyber Defense Center (CDC) continuously monitors Siemens’ networks and assets. To handle the resulting enormous data load, the CDC built a next-generation threat detection and analysis platform called ARGOS. ARGOS is a […]