AWS Big Data Blog
Category: Serverless
Analyze more demanding as well as larger time series workloads with HAQM OpenSearch Serverless
In today’s data-driven landscape, managing and analyzing vast amounts of data, especially logs, is crucial for organizations to derive insights and make informed decisions. However, handling this data efficiently presents a significant challenge, prompting organizations to seek scalable solutions without the complexity of infrastructure management. HAQM OpenSearch Serverless lets you run OpenSearch in the AWS […]
Run interactive workloads on HAQM EMR Serverless from HAQM EMR Studio
Starting from release 6.14, HAQM EMR Studio supports interactive analytics on HAQM EMR Serverless. You can now use EMR Serverless applications as the compute, in addition to HAQM EMR on EC2 clusters and HAQM EMR on EKS virtual clusters, to run JupyterLab notebooks from EMR Studio Workspaces. EMR Studio is an integrated development environment (IDE) […]
How the GoDaddy data platform achieved over 60% cost reduction and 50% performance boost by adopting HAQM EMR Serverless
This is a guest post co-written with Brandon Abear, Dinesh Sharma, John Bush, and Ozcan IIikhan from GoDaddy. GoDaddy empowers everyday entrepreneurs by providing all the help and tools to succeed online. With more than 20 million customers worldwide, GoDaddy is the place people come to name their ideas, build a professional website, attract customers, […]
In-stream anomaly detection with HAQM OpenSearch Ingestion and HAQM OpenSearch Serverless
Unsupervised machine learning analytics has emerged as a powerful tool for anomaly detection in today’s data-rich landscape, especially with the growing volume of machine-generated data. In-stream anomaly detection offers real-time insights into data anomalies, enabling proactive response. HAQM OpenSearch Serverless focuses on delivering seamless scalability and management of search workloads; HAQM OpenSearch Ingestion complements this […]
Use HAQM OpenSearch Ingestion to migrate to HAQM OpenSearch Serverless
HAQM OpenSearch Serverless is an on-demand auto scaling configuration for HAQM OpenSearch Service. Since its release, the interest for OpenSearch Serverless had been steadily growing. Customers prefer to let the service manage its capacity automatically rather than having to manually provision capacity. Until now, customers have had to rely on using custom code or third-party […]
Use HAQM Athena with Spark SQL for your open-source transactional table formats
In this post, we show you how to use Spark SQL in HAQM Athena notebooks and work with Iceberg, Hudi, and Delta Lake table formats. We demonstrate common operations such as creating databases and tables, inserting data into the tables, querying data, and looking at snapshots of the tables in HAQM S3 using Spark SQL in Athena.
How FanDuel adopted a modern HAQM Redshift architecture to serve critical business workloads
This post is co-written with Sreenivasa Mungala and Matt Grimm from FanDuel. In this post, we share how FanDuel moved from a DC2 nodes architecture to a modern HAQM Redshift architecture, which includes Redshift provisioned clusters using RA3 instances, HAQM Redshift data sharing, and HAQM Redshift Serverless. About FanDuel Part of Flutter Entertainment, FanDuel Group […]
Introducing persistent buffering for HAQM OpenSearch Ingestion
HAQM OpenSearch Ingestion is a fully managed, serverless pipeline that delivers real-time log, metric, and trace data to HAQM OpenSearch Service domains and OpenSearch Serverless collections. Customers use HAQM OpenSearch Ingestion pipelines to ingest data from a variety of data sources, both pull-based and push-based. When ingesting data from pull-based sources, such as HAQM Simple […]
Introducing AWS Glue serverless Spark UI for better monitoring and troubleshooting
Today, we are pleased to announce serverless Spark UI built into the AWS Glue console. You can now use Spark UI easily as it’s a built-in component of the AWS Glue console, enabling you to access it with a single click when examining the details of any given job run. There’s no infrastructure setup or teardown required. AWS Glue serverless Spark UI is a fully-managed serverless offering and generally starts up in a matter of seconds. Serverless Spark UI makes it significantly faster and easier to get jobs working in production because you have ready access to low level details for your job runs.
Clean up your Excel and CSV files without writing code using AWS Glue DataBrew
Managing data within an organization is complex. Handling data from outside the organization adds even more complexity. As the organization receives data from multiple external vendors, it often arrives in different formats, typically Excel or CSV files, with each vendor using their own unique data layout and structure. In this blog post, we’ll explore a […]