AWS Big Data Blog
Category: Serverless
Accelerate data pipeline creation with the new visual interface in HAQM OpenSearch Ingestion
Today, we’re launching a new visual interface for OpenSearch Ingestion that makes it simple to create and manage your data pipelines from the AWS Management Console. With this new feature, you can build pipelines in minutes without writing complex configurations manually. In this post, we walk through how these new features work and how you can use them to accelerate your data ingestion projects.
Build a data lakehouse in a hybrid Environment using HAQM EMR Serverless, Apache DolphinScheduler, and TiDB
This post discusses a decoupled approach of building a serverless data lakehouse using AWS Cloud-centered services, including HAQM EMR Serverless, HAQM Athena, HAQM Simple Storage Service (HAQM S3), Apache DolphinScheduler (an open source data job scheduler) as well as PingCAP TiDB, a third-party data warehouse product that can be deployed either on premises or on the cloud or through a software as a service (SaaS).
HAQM Redshift Serverless adds higher base capacity of up to 1024 RPUs
In this post, we explore the new higher base capacity of 1024 RPUs in Redshift Serverless, which doubles the previous maximum of 512 RPUs. This enhancement empowers you to get high performance for your workload containing highly complex queries and write-intensive workloads, with concurrent data ingestion and transformation tasks that require high throughput and low latency with Redshift Serverless.
Jumia builds a next-generation data platform with metadata-driven specification frameworks
Jumia is a technology company born in 2012, present in 14 African countries, with its main headquarters in Lagos, Nigeria. In this post, we share part of the journey that Jumia took with AWS Professional Services to modernize its data platform that ran under a Hadoop distribution to AWS serverless based solutions.
Introducing Point in Time queries and SQL/PPL support in HAQM OpenSearch Serverless
Today we announced support for three new features for HAQM OpenSearch Serverless: Point in Time (PIT) search, which enables you to maintain stable sorting for deep pagination in the presence of updates, and PPL and SQL, which give you new ways to query your data. In this post, we discuss the benefits of these new features and how to get started.
Enhance HAQM EMR scaling capabilities with Application Master Placement
Starting with the HAQM EMR 7.2 release, HAQM EMR on EC2 introduced a new feature called Application Master (AM) label awareness, which allows users to enable YARN node labels to allocate the AM containers within On-Demand nodes only. In this post, we explore the key features and use cases where this new functionality can provide significant benefits, enabling cluster administrators to achieve optimal resource utilization, improved application reliability, and cost-efficiency in your EMR on EC2 clusters.
Extract insights in a 30TB time series workload with HAQM OpenSearch Serverless
We recently announced a new capacity level of 30TB for time series data per account per AWS Region. The OpenSearch Serverless compute capacity for data ingestion and search/query is measured in OpenSearch Compute Units (OCUs), which are shared among various collections with the same AWS Key Management Service (AWS KMS) key. This post discusses how you can analyze 30TB time series datasets with OpenSearch Serverless.
Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake
In today’s data-driven world, the ability to seamlessly integrate and utilize diverse data sources is critical for gaining actionable insights and driving innovation. As organizations increasingly rely on data stored across various platforms, such as Snowflake, HAQM Simple Storage Service (HAQM S3), and various software as a service (SaaS) applications, the challenge of bringing these […]
Deliver HAQM CloudWatch logs to HAQM OpenSearch Serverless
In this blog post, we will show how to use HAQM OpenSearch Ingestion to deliver CloudWatch logs to OpenSearch Serverless in near real-time. We outline a mechanism to connect a Lambda subscription filter with OpenSearch Ingestion and deliver logs to OpenSearch Serverless without explicitly needing a separate subscription filter for it.
Perform reindexing in HAQM OpenSearch Serverless using HAQM OpenSearch Ingestion
In this post, we outline the steps to copy data between two indexes in the same OpenSearch Serverless collection using the new OpenSearch source feature of OpenSearch Ingestion. This is particularly useful for reindexing operations where you want to change your data schema. OpenSearch Serverless and OpenSearch Ingestion are both serverless services that enable you to seamlessly handle your data workflows, providing optimal performance and scalability.