AWS Big Data Blog

Category: Analytics

AWS Glue mutual TLS authentication for HAQM MSK

In today’s landscape, data streams continuously from countless sources such as social media interactions to Internet of Things (IoT) device readings. This torrent of real-time information presents both a challenge and an opportunity for businesses. To harness the power of this data effectively, organizations need robust systems for ingesting, processing, and analyzing streaming data at […]

Enrich, standardize, and translate streaming data in HAQM Redshift with generative AI

HAQM Redshift ML is a feature of HAQM Redshift that enables you to build, train, and deploy machine learning (ML) models directly within the Redshift environment. Now, you can use pretrained publicly available large language models (LLMs) in HAQM SageMaker JumpStart as part of Redshift ML, allowing you to bring the power of LLMs to analytics. You can use pretrained publicly available LLMs from leading providers such as Meta, AI21 Labs, LightOn, Hugging Face, HAQM Alexa, and Cohere as part of your Redshift ML workflows. By integrating with LLMs, Redshift ML can support a wide variety of natural language processing (NLP) use cases on your analytical data, such as text summarization, sentiment analysis, named entity recognition, text generation, language translation, data standardization, data enrichment, and more. Through this feature, the power of generative artificial intelligence (AI) and LLMs is made available to you as simple SQL functions that you can apply on your datasets. The integration is designed to be simple to use and flexible to configure, allowing you to take advantage of the capabilities of advanced ML models within your Redshift data warehouse environment.

Build a real-time analytics solution with Apache Pinot on AWS

In this, we will provide a step-by-step guide showing you how you can build a real-time OLAP datastore on HAQM Web Services (AWS) using Apache Pinot on HAQM Elastic Compute Cloud (HAQM EC2) and do near real-time visualization using Tableau. You can use Apache Pinot for batch processing use cases as well but, in this post, we will focus on a near real-time analytics use case.

Introducing data products in HAQM DataZone: Simplify discovery and subscription with business use case based grouping

We are excited to announce a new feature in HAQM DataZone that allows data producers to group data assets into well-defined, self-contained packages (data products) tailored for specific business use cases. For example, a marketing analysis data product can bundle various data assets such as marketing campaign data, pipeline data, and customer data. This simplifies […]

Set up cross-account AWS Glue Data Catalog access using AWS Lake Formation and AWS IAM Identity Center with HAQM Redshift and HAQM QuickSight

In this post, we cover how to enable trusted identity propagation with AWS IAM Identity Center, HAQM Redshift, and AWS Lake Formation residing on separate AWS accounts and set up cross-account sharing of an S3 data lake for enterprise identities using AWS Lake Formation to enable analytics using HAQM Redshift. Then we use HAQM QuickSight to build insights using Redshift tables as our data source.

HAQM OpenSearch Serverless cost-effective search capabilities, at any scale

We’re excited to announce the new lower entry cost for HAQM OpenSearch Serverless. With support for half (0.5) OpenSearch Compute Units (OCUs) for indexing and search workloads, the entry cost is cut in half. HAQM OpenSearch Serverless is a serverless deployment option for HAQM OpenSearch Service that you can use to run search and analytics workloads without the complexities […]

Improve Apache Kafka scalability and resiliency using HAQM MSK tiered storage

Since the launch of tiered storage for HAQM Managed Streaming for Apache Kafka (HAQM MSK), customers have embraced this feature for its ability to optimize storage costs and improve performance. In previous posts, we explored the inner workings of Kafka, maximized the potential of HAQM MSK, and delved into the intricacies of HAQM MSK tiered […]

Create a customizable cross-company log lake for compliance, Part I: Business Background

As builders, sometimes you want to dissect a customer experience, find problems, and figure out ways to make it better. That means going a layer down to mix and match primitives together to get more comprehensive features and more customization, flexibility, and freedom. In this post, we introduce Log Lake, a do-it-yourself data lake based on logs from CloudWatch and AWS CloudTrail.

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to HAQM Redshift

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. As data volumes continue to grow exponentially, traditional data warehousing solutions may struggle to keep up with the increasing demands for scalability, performance, and […]

Deliver HAQM CloudWatch logs to HAQM OpenSearch Serverless

In this blog post, we will show how to use HAQM OpenSearch Ingestion to deliver CloudWatch logs to OpenSearch Serverless in near real-time. We outline a mechanism to connect a Lambda subscription filter with OpenSearch Ingestion and deliver logs to OpenSearch Serverless without explicitly needing a separate subscription filter for it.