AWS Big Data Blog

Category: AWS Glue DataBrew

Data preparation using an HAQM RDS for MySQL database with AWS Glue DataBrew

With AWS Glue DataBrew, data analysts and data scientists can easily access and visually explore any amount of data across their organization directly from their HAQM Simple Storage Service (HAQM S3) data lake, HAQM Redshift data warehouse, or HAQM Aurora and HAQM Relational Database Service (HAQM RDS) databases. You can choose from over 250 built-in […]

Data preparation using HAQM Redshift with AWS Glue DataBrew

July 2023: This post was reviewed for accuracy. With AWS Glue DataBrew, data analysts and data scientists can easily access and visually explore any amount of data across their organization directly from their HAQM Simple Storage Service (HAQM S3) data lake, HAQM Redshift data warehouse, HAQM Aurora, and other HAQM Relational Database Service (HAQM RDS) databases. You can choose from over […]

Build a data quality score card using AWS Glue DataBrew, HAQM Athena, and HAQM QuickSight

Data quality plays an important role while building an extract, transform, and load (ETL) pipeline for sending data to downstream analytical applications and machine learning (ML) models. The analogy “garbage in, garbage out” is apt at describing why it’s important to filter out bad data before further processing. Continuously monitoring data quality and comparing it […]

Simplify incoming data ingestion with dynamic parameterized datasets in AWS Glue DataBrew

When data analysts and data scientists prepare data for analysis, they often rely on periodically generated data produced by upstream services, such as labeling datasets from HAQM SageMaker Ground Truth or Cost and Usage Reports from AWS Billing and Cost Management. Alternatively, they can regularly upload such data to HAQM Simple Storage Service (HAQM S3) […]

Set up CI/CD pipelines for AWS Glue DataBrew using AWS Developer Tools

An integral part of DevOps is adopting the culture of continuous integration and continuous delivery (CI/CD). This enables teams to securely store and version code, maintain parity between development and production environments, and achieve end-to-end automation of the release cycle, including building, testing, and deploying to production. In essence, development teams follow CI/CD processes to […]

Orchestrate AWS Glue DataBrew jobs using HAQM Managed Workflows for Apache Airflow

As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]

Estimate HAQM EC2 Spot Instance cost savings with AWS Glue DataBrew, AWS Glue, and HAQM QuickSight

AWS provides many ways to optimize your workloads and save on costs. For example, services like AWS Cost Explorer and AWS Trusted Advisor provide cost savings recommendations to help you optimize your AWS environments. However, you may also want to estimate cost savings when comparing HAQM Elastic Compute Cloud (HAQM EC2) Spot to On-Demand Instances. […]

Doing data preparation using on-premises PostgreSQL databases with AWS Glue DataBrew

Today, with AWS Glue DataBrew, data analysts and data scientists can easily access and visually explore any amount of data across their organization directly from their HAQM Simple Storage Service (HAQM S3) data lake, HAQM Redshift data warehouse, and HAQM Aurora and HAQM Relational Database Service (HAQM RDS) databases. Customers can choose from over 250 […]

How 1Strategy simplified their spreadsheet ETL process using AWS Glue DataBrew

This is a guest blog post by Pat Reilly and Gary Houk at 1Strategy. In their own words, “1Strategy is an APN Premier Consulting Partner focusing exclusively on AWS solutions. 1Strategy consultants help businesses architect, migrate, and optimize their workloads on AWS, creating scalable, cost-effective, secure, and reliable solutions. 1Strategy holds the AWS DevOps, Migration, […]

The following image shows how a player is positioned based on this data.

Estimating scoring probabilities by preparing soccer matches data with AWS Glue DataBrew

In soccer (or football outside of the US), players decide to take shots when they think they can score. But how do they make that determination vs. when to pass or dribble? In a fraction of a second, in motion, while chased from multiple directions by other professional athletes, they think about their distance from […]