AWS Big Data Blog
Category: Intermediate (200)
HAQM DataZone enhances data discovery with advanced search filtering
HAQM DataZone, a fully managed data management service, helps organizations catalog, discover, analyze, share, and govern data between data producers and consumers. We are excited to announce the introduction of advanced search filtering capabilities in the HAQM DataZone business data catalog. With the improved rendering of glossary terms, you can now navigate large sets of […]
Implement disaster recovery with HAQM Redshift
HAQM Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers. The objective of a disaster recovery plan is […]
Access HAQM Redshift data from Salesforce Data Cloud with Zero Copy Data Federation
This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights. This data is primarily used for analytical and machine learning purposes, […]
Run Apache Spark 3.5.1 workloads 4.5 times faster with HAQM EMR runtime for Apache Spark
The HAQM EMR runtime for Apache Spark is a performance-optimized runtime that is 100% API compatible with open source Apache Spark. It offers faster out-of-the-box performance than Apache Spark through improved query plans, faster queries, and tuned defaults. HAQM EMR on EC2, HAQM EMR Serverless, HAQM EMR on HAQM EKS, and HAQM EMR on AWS […]
Stream multi-tenant data with HAQM MSK
AWS helps SaaS vendors by providing the building blocks needed to implement a streaming application with HAQM Kinesis Data Streams and HAQM Managed Streaming for Apache Kafka (HAQM MSK), and real-time processing applications with HAQM Managed Service for Apache Flink. In this post, we look at implementation patterns a SaaS vendor can adopt when using a streaming platform as a means of integration between internal components, where streaming data is not directly exposed to third parties. In particular, we focus on HAQM MSK.
Apply fine-grained access and transformation on the SUPER data type in HAQM Redshift
HAQM Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. Tens of thousands of customers use HAQM Redshift to process exabytes of data per […]
Ingest and analyze your data using HAQM OpenSearch Service with HAQM OpenSearch Ingestion
In today’s data-driven world, organizations are continually confronted with the task of managing extensive volumes of data securely and efficiently. Whether it’s customer information, sales records, or sensor data from Internet of Things (IoT) devices, the importance of handling and storing data at scale with ease of use is paramount. A common use case that […]
Optimize storage costs in HAQM OpenSearch Service using Zstandard compression
As part of an indexing operation, the ingested documents are stored as immutable segments. Each segment is a collection of various data structures, such as inverted index, block K dimensional tree (BKD), term dictionary, or stored fields, and these data structures are responsible for retrieving the document faster during the search operation. Out of these data structures, stored fields, which are largest fields in the segment, are compressed when stored on the disk and based on the compression strategy used, the compression speed and the index storage size will vary. In this post, we discuss the performance of the Zstandard algorithm, which was introduced in OpenSearch v2.9, amongst other available compression algorithms in OpenSearch.
Modernize your data observability with HAQM OpenSearch Service zero-ETL integration with HAQM S3
We are excited to announce the general availability of HAQM OpenSearch Service zero-ETL integration with HAQM Simple Storage Service (HAQM S3) for domains running 2.13 and above. The integration is new way for customers to query operational logs in HAQM S3 and HAQM S3-based data lakes without needing to switch between tools to analyze operational data. By querying across OpenSearch Service and S3 datasets, you can evaluate multiple data sources to perform forensic analysis of operational and security events. The new integration with OpenSearch Service supports AWS’s zero-ETL vision to reduce the operational complexity of duplicating data or managing multiple analytics tools by enabling you to directly query your operational data, reducing costs and time to action.
Optimize write throughput for HAQM Kinesis Data Streams
HAQM Kinesis Data Streams is used by many customers to capture, process, and store data streams at any scale. This level of unparalleled scale is enabled by dividing each data stream into multiple shards. Each shard in a stream has a 1 Mbps or 1,000 records per second write throughput limit. Whether your data streaming […]