AWS Big Data Blog
Category: Serverless
Build a serverless pipeline to analyze streaming data using AWS Glue, Apache Hudi, and HAQM S3
Organizations typically accumulate massive volumes of data and continue to generate ever-exceeding data volumes, ranging from terabytes to petabytes and at times to exabytes of data. Such data is usually generated in disparate systems and requires an aggregation into a single location for analysis and insight generation. A data lake architecture allows you to aggregate […]
How the Georgia Data Analytics Center built a cloud analytics solution from scratch with the AWS Data Lab
This is a guest post by Kanti Chalasani, Division Director at Georgia Data Analytics Center (GDAC). GDAC is housed within the Georgia Office of Planning and Budget to facilitate governed data sharing between various state agencies and departments. The Office of Planning and Budget (OPB) established the Georgia Data Analytics Center (GDAC) with the intent […]
Audit AWS service events with HAQM EventBridge and HAQM Kinesis Data Firehose
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. HAQM EventBridge is a serverless event bus that makes it easy to build event-driven applications at scale using events generated from your applications, integrated software as a service (SaaS) applications, and AWS […]
Extract ServiceNow data using AWS Glue Studio in an HAQM S3 data lake and analyze using HAQM Athena
Many different cloud-based software as a service (SaaS) offerings are available in AWS. ServiceNow is one of the common cloud-based workflow automation platforms widely used by AWS customers. In the past few years, we saw a lot of customers who wanted to extract and integrate data from IT service management (ITSM) tools like ServiceNow for […]
How GE Aviation automated engine wash analytics with AWS Glue using a serverless architecture
This post is authored by Giridhar G Jorapur, GE Aviation Digital Technology. Maintenance and overhauling of aircraft engines are essential for GE Aviation to increase time on wing gains and reduce shop visit costs. Engine wash analytics provide visibility into the significant time on wing gains that can be achieved through effective water wash, foam […]
Validate streaming data over HAQM MSK using schemas in cross-account AWS Glue Schema Registry
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Today’s businesses face an unprecedented growth in the volume of data. A growing portion of the data is generated in real time by IoT devices, websites, business […]
Evolve JSON Schemas in HAQM MSK and HAQM Kinesis Data Streams with the AWS Glue Schema Registry
Data is being produced, streamed, and consumed at an immense rate, and that rate is projected to grow exponentially in the future. In particular, JSON is the most widely used data format across streaming technologies and workloads. As applications, websites, and machines increasingly adopt data streaming technologies such as Apache Kafka and HAQM Kinesis Data […]
Handle fast-changing reference data in an AWS Glue streaming ETL job
Streaming ETL jobs in AWS Glue can consume data from streaming sources such as HAQM Kinesis and Apache Kafka, clean and transform those data streams in-flight, as well as continuously load the results into HAQM Simple Storage Service (HAQM S3) data lakes, data warehouses, or other data stores. The always-on nature of streaming jobs poses […]
Securely share your data across AWS accounts using AWS Lake Formation
Data lakes have become very popular with organizations that want a centralized repository that allows you to store all your structured data and unstructured data at any scale. Because data is stored as is, there is no need to convert it to a predefined schema in advance. When you have new business use cases, you […]
Enrich datasets for descriptive analytics with AWS Glue DataBrew
Data analytics remains a constantly hot topic. More and more businesses are beginning to understand the potential their data has to allow them to serve customers more effectively and give them a competitive advantage. However, for many small to medium businesses, gaining insight from their data can be challenging because they often lack in-house data […]