AWS Big Data Blog
Category: Intermediate (200)
Monitor data pipelines in a serverless data lake
AWS serverless services, including but not limited to AWS Lambda, AWS Glue, AWS Fargate, HAQM EventBridge, HAQM Athena, HAQM Simple Notification Service (HAQM SNS), HAQM Simple Queue Service (HAQM SQS), and HAQM Simple Storage Service (HAQM S3), have become the building blocks for any serverless data lake, providing key mechanisms to ingest and transform data […]
Empower your Jira data in a data lake with HAQM AppFlow and AWS Glue
In the world of software engineering and development, organizations use project management tools like Atlassian Jira Cloud. Managing projects with Jira leads to rich datasets, which can provide historical and predictive insights about project and development efforts. Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other […]
A side-by-side comparison of Apache Spark and Apache Flink for common streaming use cases
Apache Flink and Apache Spark are both open-source, distributed data processing frameworks used widely for big data processing and analytics. Spark is known for its ease of use, high-level APIs, and the ability to process large amounts of data. Flink shines in its ability to handle processing of data streams in real-time and low-latency stateful […]
Simplify external object access in HAQM Redshift using automatic mounting of the AWS Glue Data Catalog
November 2024: This post was reviewed and updated for accuracy. HAQM Redshift is a petabyte-scale, enterprise-grade cloud data warehouse service delivering the best price-performance. Today, tens of thousands of customers run business-critical workloads on HAQM Redshift to cost-effectively and quickly analyze their data using standard SQL and existing business intelligence (BI) tools. HAQM Redshift now […]
Five actionable steps to GDPR compliance (Right to be forgotten) with HAQM Redshift
The GDPR (General Data Protection Regulation) right to be forgotten, also known as the right to erasure, gives individuals the right to request the deletion of their personally identifiable information (PII) data held by organizations. This means that individuals can ask companies to erase their personal data from their systems and any third parties with […]
Near-real-time analytics using HAQM Redshift streaming ingestion with HAQM Kinesis Data Streams and HAQM DynamoDB
HAQM Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on HAQM Redshift to analyze exabytes of data and run complex analytical queries, making it the widely used cloud data warehouse. You can run and […]
Improved scalability and resiliency for HAQM EMR on EC2 clusters
HAQM EMR is the cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto. Customers asked us for features that would further improve the resiliency and scalability of their HAQM EMR on EC2 clusters, including their large, long-running clusters. We have […]
End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue
Data is a key enabler for your business. Many AWS customers have integrated their data across multiple data sources using AWS Glue, a serverless data integration service, in order to make data-driven business decisions. To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle […]
Build data integration jobs with AI companion on AWS Glue Studio notebook powered by HAQM CodeWhisperer
Data is essential for businesses to make informed decisions, improve operations, and innovate. Integrating data from different sources can be a complex and time-consuming process. AWS offers AWS Glue to help you integrate your data from multiple sources on serverless infrastructure for analysis, machine learning (ML), and application development. AWS Glue provides different authoring experiences […]
Configure monitoring, limits, and alarms in HAQM Redshift Serverless to keep costs predictable
HAQM Redshift Serverless makes it simple to run and scale analytics in seconds. It automatically provisions and intelligently scales data warehouse compute capacity to deliver fast performance, and you pay only for what you use. Just load your data and start querying right away in the HAQM Redshift Query Editor or in your favorite business […]