AWS Big Data Blog
Category: Serverless
Add your own libraries and application dependencies to Spark and Hive on HAQM EMR Serverless with custom images
HAQM EMR Serverless allows you to run open-source big data frameworks such as Apache Spark and Apache Hive without managing clusters and servers. Many customers who run Spark and Hive applications want to add their own libraries and dependencies to the application runtime. For example, you may want to add popular open-source extensions to Spark, […]
HAQM EMR Serverless cost estimator
HAQM EMR Serverless is a serverless option in HAQM EMR that makes it easy for data analysts and engineers to run applications using open-source big data analytics frameworks such as Apache Spark and Hive without configuring, managing, and scaling clusters or servers. You get all the features of the latest open-source frameworks with the performance-optimized […]
Introducing the Cloud Shuffle Storage Plugin for Apache Spark
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. In AWS Glue, you can use Apache Spark, an open-source, distributed processing system for your data integration tasks and big data workloads. Apache Spark utilizes in-memory caching and optimized […]
Log analytics the easy way with HAQM OpenSearch Serverless
We recently announced the preview release of HAQM OpenSearch Serverless, a new serverless option for HAQM OpenSearch Service, which makes it easy for you to run large-scale search and analytics workloads without having to configure, manage, or scale OpenSearch clusters. It automatically provisions and scales the underlying resources to deliver fast data ingestion and query […]
Get started with data integration from HAQM S3 to HAQM Redshift using AWS Glue interactive sessions
Organizations are placing a high priority on data integration, especially to support analytics, machine learning (ML), business intelligence (BI), and application development initiatives. Data is growing exponentially and is generated by increasingly diverse data sources. Data integration becomes challenging when processing data at scale and the inherent heavy lifting associated with infrastructure required to manage […]
Share and publish your Snowflake data to AWS Data Exchange using HAQM Redshift data sharing
HAQM Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. Today, tens of thousands of AWS customers—from Fortune 500 companies, startups, and everything in between—use HAQM Redshift to run mission-critical business intelligence (BI) dashboards, […]
Use an event-driven architecture to build a data mesh on AWS
In this post, we take the data mesh design discussed in Design a data mesh architecture using AWS Lake Formation and AWS Glue, and demonstrate how to initialize data domain accounts to enable managed sharing; we also go through how we can use an event-driven approach to automate processes between the central governance account and […]
How Hudl built a cost-optimized AWS Glue pipeline with Apache Hudi datasets
This is a guest blog post co-written with Addison Higley and Ramzi Yassine from Hudl. Hudl Agile Sports Technologies, Inc. is a Lincoln, Nebraska based company that provides tools for coaches and athletes to review game footage and improve individual and team play. Its initial product line served college and professional American football teams. Today, […]
How SOCAR built a streaming data pipeline to process IoT data for real-time analytics and control
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. SOCAR is the leading Korean mobility company with strong competitiveness in car-sharing. SOCAR has become a comprehensive mobility platform in collaboration with Nine2One, an e-bike sharing service, […]
California State University Chancellor’s Office reduces cost and improves efficiency using HAQM QuickSight for streamlined HR reporting in higher education
The California State University Chancellor’s Office (CSUCO) sits at the center of America’s most significant and diverse 4-year universities. The California State University (CSU) serves approximately 477,000 students and employs more than 55,000 staff and faculty members across 23 universities and 7 off-campus centers. The CSU provides students with opportunities to develop intellectually and personally, […]