AWS Big Data Blog
HAQM DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog
Last week, we announced the general availability of the integration between HAQM DataZone and AWS Lake Formation hybrid access mode. In this post, we share how this new feature helps you simplify the way you use HAQM DataZone to enable secure and governed sharing of your data in the AWS Glue Data Catalog. We also […]
How Aura from Unity revolutionized their big data pipeline with HAQM Redshift Serverless
Aura from Unity (formerly known as ironSource) is the market standard for creating rich device experiences that engage and retain customers. In this post, we describe Aura’s successful and swift adoption of Redshift Serverless, which allowed them to optimize their overall bidding advertisement campaigns’ time to market from 24 hours to 2 hours. We explore why Aura chose this solution and what technological challenges it helped solve.
Automate large-scale data validation using HAQM EMR and Apache Griffin
Many enterprises are migrating their on-premises data stores to the AWS Cloud. During data migration, a key requirement is to validate all the data that has been moved from source to target. This data validation is a critical step, and if not done correctly, may result in the failure of the entire project. However, developing […]
HAQM DataZone now integrates with AWS Glue Data Quality and external data quality solutions
Today, we are pleased to announce that HAQM DataZone is now able to present data quality information for data assets. This information empowers end-users to make informed decisions as to whether or not to use specific assets. In this post, we discuss the latest features of HAQM DataZone for data quality, the integration between HAQM DataZone and AWS Glue Data Quality and how you can import data quality scores produced by external systems into HAQM DataZone via API.
Use Apache Iceberg in your data lake with HAQM S3, AWS Glue, and Snowflake
Customers are using AWS and Snowflake to develop purpose-built data architectures that provide the performance required for modern analytics and artificial intelligence (AI) use cases. Implementing these solutions requires data sharing between purpose-built data stores. This is why Snowflake and AWS are delivering enhanced support for Apache Iceberg to enable and facilitate data interoperability between data services. Apache Iceberg is an open-source table format that provides reliability, simplicity, and high performance for large datasets with transactional integrity between various processing engines.
Simplify your query management with search templates in HAQM OpenSearch Service
HAQM OpenSearch Service is an Apache-2.0-licensed distributed search and analytics suite offered by AWS. This fully managed service allows organizations to secure data, perform keyword and semantic search, analyze logs, alert on anomalies, explore interactive log analytics, implement real-time application monitoring, and gain a more profound understanding of their information landscape. OpenSearch Service provides the […]
AI recommendations for descriptions in HAQM DataZone for enhanced business data cataloging and discovery is now generally available
In March 2024, we announced the general availability of the generative artificial intelligence (AI) generated data descriptions in HAQM DataZone. In this post, we share what we heard from our customers that led us to add the AI-generated data descriptions and discuss specific customer use cases addressed by this capability. We also detail how the […]
Deliver decompressed HAQM CloudWatch Logs to HAQM S3 and Splunk using HAQM Data Firehose
You can use HAQM Data Firehose to aggregate and deliver log events from your applications and services captured in HAQM CloudWatch Logs to your HAQM Simple Storage Service (HAQM S3) bucket and Splunk destinations, for use cases such as data analytics, security analysis, application troubleshooting etc. By default, CloudWatch Logs are delivered as gzip-compressed objects. […]
Nexthink scales to trillions of events per day with HAQM MSK
Real-time data streaming and event processing present scalability and management challenges. AWS offers a broad selection of managed real-time data streaming services to effortlessly run these workloads at any scale. In this post, Nexthink shares how HAQM Managed Streaming for Apache Kafka (HAQM MSK) empowered them to achieve massive scale in event processing. Experiencing business […]
Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using HAQM QuickSight
In Part 2 of this series, we discussed how to enable AWS Glue job observability metrics and integrate them with Grafana for real-time monitoring. Grafana provides powerful customizable dashboards to view pipeline health. However, to analyze trends over time, aggregate from different dimensions, and share insights across the organization, a purpose-built business intelligence (BI) tool […]