AWS Big Data Blog
Category: Intermediate (200)
Configure SAML federation for HAQM OpenSearch Serverless with AWS IAM Identity Center
HAQM OpenSearch Serverless is a serverless option of HAQM OpenSearch Service that makes it easy for you to run large-scale search and analytics workloads without having to configure, manage, or scale OpenSearch clusters. It automatically provisions and scales the underlying resources to deliver fast data ingestion and query responses for even the most demanding and […]
Push HAQM EMR step logs from HAQM EC2 instances to HAQM CloudWatch logs
HAQM EMR is a big data service offered by AWS to run Apache Spark and other open-source applications on AWS to build scalable data pipelines in a cost-effective manner. Monitoring the logs generated from the jobs deployed on EMR clusters is essential to help detect critical issues in real time and identify root causes quickly. […]
Perform accent-insensitive search using OpenSearch
We often need our text search to be agnostic of accent marks. Accent-insensitive search, also called diacritics-agnostic search, is where search results are the same for queries that may or may not contain Latin characters such as à, è, Ê, ñ, and ç. Diacritics are English letters with an accent to mark a difference in […]
Visualize Confluent data in HAQM QuickSight using HAQM Athena
This is a guest post written by Ahmed Saef Zamzam and Geetha Anne from Confluent. Businesses are using real-time data streams to gain insights into their company’s performance and make informed, data-driven decisions faster. As real-time data has become essential for businesses, a growing number of companies are adapting their data strategy to focus on […]
Manage your data warehouse cost allocations with HAQM Redshift Serverless tagging
HAQM Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) […]
How AWS Payments migrated from Redash to HAQM Redshift Query Editor v2
AWS Payments is part of the AWS Commerce Platform (CP) organization that owns the customer experience of paying AWS invoices. It helps AWS customers manage their payment methods and payment preferences, and helps customers make self-service payments to AWS. The Machine Learning, Data and Analytics (MLDA) team at AWS Payments enables data-driven decision-making across payments […]
Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor
In the first post of this series, we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. This native support simplifies reading and writing your data for these data lake frameworks so you can more […]
Extend geospatial queries in HAQM Athena with UDFs and AWS Lambda
HAQM Athena is a serverless and interactive query service that allows you to easily analyze data in HAQM Simple Storage Service (HAQM S3) and 25-plus data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena built-in capabilities include querying for geospatial data; for example, you can count the number of […]
How gaming companies can use HAQM Redshift Serverless to build scalable analytical applications faster and easier
This post provides guidance on how to build scalable analytical solutions for gaming industry use cases using HAQM Redshift Serverless. It covers how to use a conceptual, logical architecture for some of the most popular gaming industry use cases like event analysis, in-game purchase recommendations, measuring player satisfaction, telemetry data analysis, and more. This post […]
Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and HAQM EMR Serverless
Building data lakes from continuously changing transactional data of databases and keeping data lakes up to date is a complex task and can be an operational challenge. A solution to this problem is to use AWS Database Migration Service (AWS DMS) for migrating historical and real-time transactional data into the data lake. You can then […]