Posted On: Sep 8, 2017
This Quick Start deploys a data lake foundation that integrates HAQM Web Services (AWS) Cloud services such as HAQM Simple Storage Service (HAQM S3), HAQM Redshift, HAQM Kinesis, HAQM Athena, HAQM Elasticsearch Service (HAQM ES), and HAQM QuickSight.
The data lake foundation provides these features:
- Data submission, including batch submissions to HAQM S3 and streaming submissions via HAQM Kinesis Firehose
- Ingest processing, including data validation, metadata extraction, and indexing via HAQM S3 events, HAQM Simple Notification Service (HAQM SNS), AWS Lambda, HAQM Kinesis Analytics, and HAQM ES
- Dataset management through HAQM Redshift transformations and Kinesis Analytics
- Data transformation, aggregation, and analysis through HAQM Athena and HAQM Redshift Spectrum
- Search, by indexing metadata in HAQM ES and exposing it through Kibana dashboards
- Publishing into an S3 bucket for use by visualization tools, and visualization with HAQM QuickSight
Once this foundation is in place, you may choose to augment the data lake with ISV and software as a service (SaaS) tools.
The deployment also includes an optional wizard and a sample dataset that is loaded into the HAQM Redshift cluster and Kinesis streams. The data lake wizard uses the dataset to demonstrate data lake capabilities such as search, transforms, queries, analytics, and visualization.
AWS CloudFormation templates automate the deployment and provide customization options for network resources and AWS services. You can choose to build a new virtual private cloud (VPC) infrastructure that’s configured for security, scalability, and high availability, or use your existing VPC infrastructure for the data lake foundation.
To get started, use the following resources:
- Learn more about the data lake foundation architecture
- View the deployment guide
- Browse and launch other AWS Quick Start reference deployments
About Quick Starts
Quick Starts are automated reference deployments for key workloads on the AWS Cloud. Each Quick Start launches, configures, and runs the AWS compute, network, storage, and other services required to deploy a specific workload on AWS, using AWS best practices for security and availability. This is the latest in a set of AWS customer-ready solutions, which are ready-to-deploy reference architectures and best practices that address specific use cases or business processes.