AWS Big Data Blog
Category: HAQM Simple Storage Service (S3)
HAQM EMR 6.2.0 adds persistent HFile tracking to improve performance with HBase on HAQM S3
Apache HBase is an open-source, NoSQL database that you can use to achieve low latency random access to billions of rows. Starting with HAQM EMR 5.2.0, you can enable HBase on HAQM Simple Storage Service (HAQM S3). With HBase on HAQM S3, the HBase data files (HFiles) are written to HAQM S3, enabling data lake […]
Ingest Salesforce data into HAQM S3 using the CData JDBC custom connector with AWS Glue
Organizations that successfully generate business value from their data will outperform their peers. Many AWS customers require a data storage and analytics solution that combines the prospect information stored in Salesforce, a popular and widely used customer relationship management (CRM) platform, with other structured and unstructured data in their data lake to innovate and build […]
Integrating Datadog data with AWS using HAQM AppFlow for intelligent monitoring
Infrastructure and operation teams are often challenged with getting a full view into their IT environments to do monitoring and troubleshooting. New monitoring technologies are needed to provide an integrated view of all components of an IT infrastructure and application system. Datadog provides intelligent application and service monitoring by bringing together data from servers, databases, […]
Querying a Vertica data source in HAQM Athena using the Athena Federated Query SDK
The ability to query data and perform ad hoc analysis across multiple platforms and data stores with a single tool brings immense value to the big data analytical arena. As organizations build out data lakes with increasing volumes of data, there is a growing need to combine that data with large amounts of data in […]
Automating AWS service logs table creation and querying them with HAQM Athena
I was working with a customer who was just getting started using AWS, and they wanted to understand how to query their AWS service logs that were being delivered to HAQM Simple Storage Service (HAQM S3). I introduced them to HAQM Athena, a serverless, interactive query service that allows you to easily analyze data in […]
Building a cost efficient, petabyte-scale lake house with HAQM S3 lifecycle rules and HAQM Redshift Spectrum: Part 2
In part 1 of this series, we demonstrated building an end-to-end data lifecycle management system integrated with a data lake house implemented on HAQM Simple Storage Service (HAQM S3) with HAQM Redshift and HAQM Redshift Spectrum. In this post, we address the ongoing operation of the solution we built. Data ageing process after a month […]
Building a cost efficient, petabyte-scale lake house with HAQM S3 lifecycle rules and HAQM Redshift Spectrum: Part 1
The continuous growth of data volumes combined with requirements to implement long-term retention (typically due to specific industry regulations) puts pressure on the storage costs of data warehouse solutions, even for cloud native data warehouse services such as HAQM Redshift. The introduction of the new HAQM Redshift RA3 node types helped in decoupling compute from […]
Dream11’s journey to building their Data Highway on AWS
This is a guest post co-authored by Pradip Thoke of Dream11. In their own words, “Dream11, the flagship brand of Dream Sports, is India’s biggest fantasy sports platform, with more than 100 million users. We have infused the latest technologies of analytics, machine learning, social networks, and media technologies to enhance our users’ experience. Dream11 […]
How FanDuel Group secures personally identifiable information in a data lake using AWS Lake Formation
This post is co-written with Damian Grech from FanDuel FanDuel Group is an innovative sports-tech entertainment company that is changing the way consumers engage with their favorite sports, teams, and leagues. The premier gaming destination in the US, FanDuel Group consists of a portfolio of leading brands across gaming, sports betting, daily fantasy sports, advance-deposit […]
Ingesting Jira data into HAQM S3
Consolidating data from a work management tool like Jira and integrating this data with other data sources like ServiceNow, GitHub, Jenkins, and Time Entry Systems enables end-to-end visibility of different aspects of the software development lifecycle and helps keep your projects on schedule and within budget. HAQM Simple Storage Service (HAQM S3) is an object […]