Intermediate (200) | AWS Big Data Blog

Unleash deeper insights with HAQM Redshift data sharing for data lake tables

HAQM Redshift now enables the secure sharing of data lake tables—also known as external tables or HAQM Redshift Spectrum tables—that are managed in the AWS Glue Data Catalog, as well as Redshift views referencing those data lake tables. By using granular access controls, data sharing in HAQM Redshift helps data owners maintain tight governance over who can access the shared information. In this post, we explore powerful use cases that demonstrate how you can enhance cross-team and cross-organizational collaboration, reduce overhead, and unlock new insights by using this innovative data sharing functionality.

Extract insights in a 30TB time series workload with HAQM OpenSearch Serverless

We recently announced a new capacity level of 30TB for time series data per account per AWS Region. The OpenSearch Serverless compute capacity for data ingestion and search/query is measured in OpenSearch Compute Units (OCUs), which are shared among various collections with the same AWS Key Management Service (AWS KMS) key. This post discusses how you can analyze 30TB time series datasets with OpenSearch Serverless.

Build a dynamic rules engine with HAQM Managed Service for Apache Flink

This post demonstrates how to implement a dynamic rules engine using HAQM Managed Service for Apache Flink. Our implementation provides the ability to create dynamic rules that can be created and updated without the need to change or redeploy the underlying code or implementation of the rules engine itself. We discuss the architecture, the key services of the implementation, some implementation details that you can use to build your own rules engine, and an AWS Cloud Development Kit (AWS CDK) project to deploy this in your own account.

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

In this post, we explore a solution using AWS Lake Formation and AWS IAM Identity Center to address the complex challenges of managing and governing legacy data during digital transformation. We demonstrate how enterprises can effectively preserve historical data while enforcing compliance and maintaining user entitlements. This solution enables your organization to maintain robust audit trails, enforce governance controls, and provide secure, role-based access to data.

Achieve cross-Region resilience with HAQM OpenSearch Ingestion

In this post, we outline two solutions that provide cross-Region resiliency without needing to reestablish relationships during a failback, using an active-active replication model with HAQM OpenSearch Ingestion (OSI) and HAQM Simple Storage Service (HAQM S3). These solutions apply to both OpenSearch Service managed clusters and OpenSearch Serverless collections. We use OpenSearch Serverless as an example for the configurations in this post.

Harness Zero Copy data sharing from Salesforce Data Cloud to HAQM Redshift for Unified Analytics – Part 2

Salesforce and HAQM have collaborated to help customers unlock value from unified data and accelerate time to insights with bidirectional Zero Copy data sharing between Salesforce Data Cloud and HAQM Redshift. In the Part 1 of this series, we discussed how to configure data sharing between Salesforce Data Cloud and customers’ AWS accounts in the same AWS Region. In this post, we discuss the architecture and implementation details of cross-Region data sharing between Salesforce Data Cloud and customers’ AWS accounts.

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

The AWS Glue Data Catalog now enhances managed table optimization of Apache Iceberg tables by automatically removing data files that are no longer needed. Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance. Iceberg creates a new version called […]

Differentiate generative AI applications with your data using AWS analytics and managed databases

While the potential of generative artificial intelligence (AI) is increasingly under evaluation, organizations are at different stages in defining their generative AI vision. In many organizations, the focus is on large language models (LLMs), and foundation models (FMs) more broadly. This is just the tip of the iceberg, because what enables you to obtain differential […]

Developer guidance on how to do local testing with HAQM MSK Serverless

In this post, I present you with guidance on how developers can connect to HAQM MSK Serverless from local environments. The connection is done using an HAQM MSK endpoint through an SSH tunnel and a bastion host. This enables developers to experiment and test locally, without needing to setup a separate Kafka cluster.

Publish and enrich real-time financial data feeds using HAQM MSK and HAQM Managed Service for Apache Flink

In this post, we demonstrate how you can publish an enriched real-time data feed on AWS using HAQM Managed Streaming for Kafka (HAQM MSK) and HAQM Managed Service for Apache Flink. You can apply this architecture pattern to various use cases within the capital markets industry; we discuss some of those use cases in this post.

Category: Intermediate (200)