AWS Big Data Blog

Category: Analytics

How to track HAQM OpenSearch Service domain-level cost

HAQM OpenSearch Service Pricing is based on three dimensions: instances, storage, and data transfer. Storage pricing depends on the chosen storage type and also the storage tier. Visibility into domain-level charges enables accurate budgeting, efficient resource allocation, fair cost attribution across projects, and overall cost transparency. In this post, we show you how to view the OpenSearch Service domain-level cost using AWS Cost Explorer.

HAQM OpenSearch Service: Managed and community driven

Today the Linux Foundation announced the OpenSearch Software Foundation. As part of the creation of the OpenSearch Foundation, AWS has transferred ownership of OpenSearch to the Linux Foundation. At the launch of the project in April of 2021, in introducing OpenSearch, we spoke of our desire to “ensure users continue to have a secure, high-quality, fully open source search and analytics suite with a rich roadmap of new and innovative functionality.” We’ve maintained that desire and commitment, and with this transfer, are deepening that commitment, and bringing in the broader community with open governance to help with that goal.

Use Batch Processing Gateway to automate job management in multi-cluster HAQM EMR on EKS environments

AWS customers often process petabytes of data using HAQM EMR on EKS. In enterprise environments with diverse workloads or varying operational requirements, customers frequently choose a multi-cluster setup due to the following advantages: Better resiliency and no single point of failure – If one cluster fails, other clusters can continue processing critical workloads, maintaining business […]

How CFM built a well-governed and scalable data-engineering platform using HAQM EMR for financial features generation

Capital Fund Management (CFM) is an alternative investment management company based in Paris with staff in New York City and London. CFM takes a scientific approach to finance, using quantitative and systematic techniques to develop the best investment strategies. In this post, we share how we built a well-governed and scalable data engineering platform using HAQM EMR for financial features generation.

Harness Zero Copy data sharing from Salesforce Data Cloud to HAQM Redshift for Unified Analytics – Part 2

Salesforce and HAQM have collaborated to help customers unlock value from unified data and accelerate time to insights with bidirectional Zero Copy data sharing between Salesforce Data Cloud and HAQM Redshift. In the Part 1 of this series, we discussed how to configure data sharing between Salesforce Data Cloud and customers’ AWS accounts in the same AWS Region. In this post, we discuss the architecture and implementation details of cross-Region data sharing between Salesforce Data Cloud and customers’ AWS accounts.

architecture

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

The AWS Glue Data Catalog now enhances managed table optimization of Apache Iceberg tables by automatically removing data files that are no longer needed. Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance. Iceberg creates a new version called […]

Differentiate generative AI applications with your data using AWS analytics and managed databases

While the potential of generative artificial intelligence (AI) is increasingly under evaluation, organizations are at different stages in defining their generative AI vision. In many organizations, the focus is on large language models (LLMs), and foundation models (FMs) more broadly. This is just the tip of the iceberg, because what enables you to obtain differential […]

How ZS built a clinical knowledge repository for semantic search using HAQM OpenSearch Service and HAQM Neptune

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, […]

Developer guidance on how to do local testing with HAQM MSK Serverless

In this post, I present you with guidance on how developers can connect to HAQM MSK Serverless from local environments. The connection is done using an HAQM MSK endpoint through an SSH tunnel and a bastion host. This enables developers to experiment and test locally, without needing to setup a separate Kafka cluster.