AWS Big Data Blog

Category: Serverless

Orchestrate AWS Glue DataBrew jobs using HAQM Managed Workflows for Apache Airflow

As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]

Estimate HAQM EC2 Spot Instance cost savings with AWS Glue DataBrew, AWS Glue, and HAQM QuickSight

AWS provides many ways to optimize your workloads and save on costs. For example, services like AWS Cost Explorer and AWS Trusted Advisor provide cost savings recommendations to help you optimize your AWS environments. However, you may also want to estimate cost savings when comparing HAQM Elastic Compute Cloud (HAQM EC2) Spot to On-Demand Instances. […]

The following diagram shows the overall architecture to address our two challenges.

Extract multidimensional data from Microsoft SQL Server Analysis Services using AWS Glue

AWS Glue is fully managed service that makes it easier for you to extract, transform, and load (ETL) data for analytics. You can easily create ETL jobs to connect to backend data sources. There are several natively supported data sources, but what if you need to extract data from an unsupported data source? What if […]

Migrate terabytes of data quickly from Google Cloud to HAQM S3 with AWS Glue Connector for Google BigQuery

This blog post was last updated July, 2022 to update the new version of the connector and details on how to push down queries to Google BigQuery. The cloud is often seen as advantageous for data lakes because of better security, faster time to deployment, better availability, more frequent feature and functionality updates, more elasticity, […]

Doing data preparation using on-premises PostgreSQL databases with AWS Glue DataBrew

Today, with AWS Glue DataBrew, data analysts and data scientists can easily access and visually explore any amount of data across their organization directly from their HAQM Simple Storage Service (HAQM S3) data lake, HAQM Redshift data warehouse, and HAQM Aurora and HAQM Relational Database Service (HAQM RDS) databases. Customers can choose from over 250 […]

Automate dynamic mapping and renaming of column names in data files using AWS Glue: Part 1

A common challenge ETL and big data developers face is working with data files that don’t have proper name header records. They’re tasked with renaming the columns of the data files appropriately so that downstream application and mappings for data load can work seamlessly. One example use case is while working with ORC files and […]

How 1Strategy simplified their spreadsheet ETL process using AWS Glue DataBrew

This is a guest blog post by Pat Reilly and Gary Houk at 1Strategy. In their own words, “1Strategy is an APN Premier Consulting Partner focusing exclusively on AWS solutions. 1Strategy consultants help businesses architect, migrate, and optimize their workloads on AWS, creating scalable, cost-effective, secure, and reliable solutions. 1Strategy holds the AWS DevOps, Migration, […]

In the navigation name, choose Marketplace and search for Salesforce.

Ingest Salesforce data into HAQM S3 using the CData JDBC custom connector with AWS Glue

Organizations that successfully generate business value from their data will outperform their peers. Many AWS customers require a data storage and analytics solution that combines the prospect information stored in Salesforce, a popular and widely used customer relationship management (CRM) platform, with other structured and unstructured data in their data lake to innovate and build […]

The following diagram shows the flow of our solution.

Integrating Datadog data with AWS using HAQM AppFlow for intelligent monitoring

Infrastructure and operation teams are often challenged with getting a full view into their IT environments to do monitoring and troubleshooting. New monitoring technologies are needed to provide an integrated view of all components of an IT infrastructure and application system. Datadog provides intelligent application and service monitoring by bringing together data from servers, databases, […]