AWS Big Data Blog
Category: Compute
Unify log aggregation and analytics across compute platforms
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. Our customers want to make sure their users have the best experience running their application on AWS. To make this happen, you need to monitor and fix software problems as quickly as […]
Optimize performance and reduce costs for network analytics with VPC Flow Logs in Apache Parquet format
VPC Flow Logs help you understand network traffic patterns, identify security issues, audit usage, and diagnose network connectivity on AWS. Customers often route their VPC flow logs directly to HAQM Simple Storage Service (HAQM S3) for long-term retention. You can then use a custom format conversion application to convert these text files into an Apache […]
HAQM QuickSight deployment models for cross-account and cross-Region access to HAQM Redshift and HAQM RDS
Many AWS customers use multiple AWS accounts and Regions across different departments and applications within the same company. However, you might deploy services like HAQM QuickSight using a single-account approach to centralize users, data source access, and dashboard management. This post explores how you can use different HAQM Virtual Private Cloud (HAQM VPC) private connectivity features to connect QuickSight […]
How NortonLifelock built a serverless architecture for real-time analysis of their VPN usage metrics
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This post presents a reference architecture and optimization strategies for building serverless data analytics solutions on AWS using HAQM Kinesis Data Analytics. In addition, this post shows […]
Configure HAQM EMR Studio and HAQM EKS to run notebooks with HAQM EMR on EKS
HAQM EMR on HAQM EKS provides a deployment option for HAQM EMR that allows you to run analytics workloads on HAQM Elastic Kubernetes Service (HAQM EKS). This is an attractive option because it allows you to run applications on a common pool of resources without having to provision infrastructure. In addition, you can use HAQM […]
Reduce costs and increase resource utilization of Apache Spark jobs on Kubernetes with HAQM EMR on HAQM EKS
HAQM EMR on HAQM EKS is a deployment option for HAQM EMR that allows you to run Apache Spark on HAQM Elastic Kubernetes Service (HAQM EKS). If you run open-source Apache Spark on HAQM EKS, you can now use HAQM EMR to automate provisioning and management, and run Apache Spark up to three times faster. […]
Run and debug Apache Spark applications on AWS with HAQM EMR on HAQM EKS
Customers today want to focus more on their core business model and less on the underlying infrastructure and operational burden. As customers migrate to the AWS Cloud, they’re realizing the benefits of being able to innovate faster on their own applications by relying on AWS to handle big data platforms, operations, and automation. Many of […]
Run a Spark SQL-based ETL pipeline with HAQM EMR on HAQM EKS
Increasingly, a business’s success depends on its agility in transforming data into actionable insights, which requires efficient and automated data processes. In the previous post – Build a SQL-based ETL pipeline with Apache Spark on HAQM EKS, we described a common productivity issue in a modern data architecture. To address the challenge, we demonstrated how to utilize a declarative approach as the key enabler to improve efficiency, which resulted in a faster time to value for businesses. Generally speaking, managing applications declaratively in Kubernetes is a widely adopted best practice. You can use the same approach to build and deploy Spark applications with open-source or in-house build frameworks to achieve the same productivity goal.
Build a SQL-based ETL pipeline with Apache Spark on HAQM EKS
Today, the most successful and fastest growing companies are generally data-driven organizations. Taking advantage of data is pivotal to answering many pressing business problems; however, this can prove to be overwhelming and difficult to manage due to data’s increasing diversity, scale, and complexity. One of the most popular technologies that businesses use to overcome these […]
Query SAP HANA using Athena Federated Query and join with data in your HAQM S3 data lake
This post was last reviewed and updated July, 2022 with updates in Athena federation connector. If you use data lakes in HAQM Simple Storage Service (HAQM S3) and use SAP HANA as your transactional data store, you may need to join the data in your data lake with SAP HANA in the cloud, SAP HANA […]