AWS Big Data Blog
HAQM DataZone introduces OpenLineage-compatible data lineage visualization in preview
We are excited to announce the preview of API-driven, OpenLineage-compatible data lineage in HAQM DataZone to help you capture, store, and visualize lineage of data movement and transformations of data assets on HAQM DataZone. With the HAQM DataZone OpenLineage-compatible API, domain administrators and data producers can capture and store lineage events beyond what is available […]
HAQM Managed Service for Apache Flink now supports Apache Flink version 1.19
Apache Flink is an open source distributed processing engine, offering powerful programming interfaces for both stream and batch processing, with first-class support for stateful processing and event time semantics. Apache Flink supports multiple programming languages, Java, Python, Scala, SQL, and multiple APIs with different level of abstraction, which can be used interchangeably in the same […]
Enhance data security with fine-grained access controls in HAQM DataZone
Fine-grained access control is a crucial aspect of data security for modern data lakes and data warehouses. As organizations handle vast amounts of data across multiple data sources, the need to manage sensitive information has become increasingly important. Making sure the right people have access to the right data, without exposing sensitive information to unauthorized […]
Automate data loading from your database into HAQM Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API
HAQM Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. Tens of thousands of customers use HAQM Redshift to process exabytes of data per […]
Introducing self-managed data sources for HAQM OpenSearch Ingestion
Enterprise customers increasingly adopt HAQM OpenSearch Ingestion (OSI) to bring data into HAQM OpenSearch Service for various use cases. These include petabyte-scale log analytics, real-time streaming, security analytics, and searching semi-structured key-value or document data. OSI makes it simple, with straightforward integrations, to ingest data from many AWS services, including HAQM DynamoDB, HAQM Simple Storage […]
HAQM MWAA best practices for managing Python dependencies
Customers with data engineers and data scientists are using HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as a central orchestration platform for running data pipelines and machine learning (ML) workloads. To support these pipelines, they often require additional Python packages, such as Apache Airflow Providers. For example, a pipeline may require the Snowflake provider […]
HAQM DataZone enhances data discovery with advanced search filtering
HAQM DataZone, a fully managed data management service, helps organizations catalog, discover, analyze, share, and govern data between data producers and consumers. We are excited to announce the introduction of advanced search filtering capabilities in the HAQM DataZone business data catalog. With the improved rendering of glossary terms, you can now navigate large sets of […]
Implement disaster recovery with HAQM Redshift
HAQM Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers. The objective of a disaster recovery plan is […]
Build a real-time streaming generative AI application using HAQM Bedrock, HAQM Managed Service for Apache Flink, and HAQM Kinesis Data Streams
Data streaming enables generative AI to take advantage of real-time data and provide businesses with rapid insights. This post looks at how to integrate generative AI capabilities when implementing a streaming architecture on AWS using managed services such as Managed Service for Apache Flink and HAQM Kinesis Data Streams for processing streaming data and HAQM Bedrock to utilize generative AI capabilities. We include a reference architecture and a step-by-step guide on infrastructure setup and sample code for implementing the solution with the AWS Cloud Development Kit (AWS CDK). You can find the code to try it out yourself on the GitHub repo.
HAQM DataZone announces custom blueprints for AWS services
Last week, we announced the general availability of custom AWS service blueprints, a new feature in HAQM DataZone allowing you to customize your HAQM DataZone project environments to use existing AWS Identity and Access Management (IAM) roles and AWS services to embed the service into your existing processes. In this post, we share how this […]