AWS Big Data Blog
Category: Intermediate (200)
Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and HAQM DynamoDB
Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets. Data lakes are not transactional by default; however, there […]
Real-time time series anomaly detection for streaming applications on HAQM Managed Service for Apache Flink
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Detecting anomalies in real time from high-throughput streams is key for informing on timely decisions in order to adapt and respond to unexpected scenarios. Stream processing frameworks […]
Simplify AWS Glue job orchestration and monitoring with HAQM MWAA
Organizations across all industries have complex data processing requirements for their analytical use cases across different analytics systems, such as data lakes on AWS, data warehouses (HAQM Redshift), search (HAQM OpenSearch Service), NoSQL (HAQM DynamoDB), machine learning (HAQM SageMaker), and more. Analytics professionals are tasked with deriving value from data stored in these distributed systems […]
What’s new with HAQM MWAA support for startup scripts
HAQM Managed Workflow for Apache Airflow (HAQM MWAA) is a managed service for Apache Airflow that lets you use the same familiar Apache Airflow environment to orchestrate your workflows and enjoy improved scalability, availability, and security without the operational burden of having to manage the underlying infrastructure. In April 2023, HAQM MWAA added support for […]
Stream data with HAQM MSK Connect using an open-source JDBC connector
Customers are adopting HAQM Managed Service for Apache Kafka (HAQM MSK) as a fast and reliable streaming platform to build their enterprise data hub. In addition to streaming capabilities, setting up HAQM MSK enables organizations to use a pub/sub model for data distribution with loosely coupled and independent components. To publish and distribute the data […]
Improve power utility operational efficiency using smart sensor data and HAQM QuickSight
This blog post is co-written with Steve Alexander at PG&E. In today’s rapidly changing energy landscape, power disturbances cause businesses millions of dollars due to service interruptions and power quality issues. Large utility territories make it difficult to detect and locate faults when power outages occur, leading to longer restoration times, recurring outages, and unhappy […]
Ten new visual transforms in AWS Glue Studio
AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. It allows you to visually compose data transformation workflows using nodes that represent different data handling steps, which later are converted automatically into code to run. AWS Glue Studio recently […]
Use SAML Identities for programmatic access to HAQM OpenSearch Service
Customers of HAQM OpenSearch Service can already use Security Assertion Markup Language (SAML) to access OpenSearch Dashboards. This post outlines two methods by which programmatic users can now access OpenSearch using SAML identities. This applies to all identity providers (IdPs) that support SAML 2.0, including prevalent ones like Active Directory Federation Service (ADFS), Okta, AWS […]
Scale your AWS Glue for Apache Spark jobs with larger worker types G.4X and G.8X
Hundreds of thousands of customers use AWS Glue, a serverless data integration service, to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue for Apache Spark jobs work with your code and configuration of the number of data processing units (DPU). Each DPU provides 4 vCPU, 16 GB memory, […]
New scatter plot options in HAQM QuickSight to visualize your data
Are you looking to understand the relationships between two numerical variables? Scatter plots are a powerful visual type that allow you to identify patterns, outliers, and strength of relationships between variables. In this post, we walk you through the newly launched scatter plot features in HAQM QuickSight, which will help you take your correlation analysis […]