[SEO Subhead]
This Guidance demonstrates how to implement dual-write capabilities from Apache Cassandra to HAQM Keyspaces for Apache Cassandra with minimal downtime during the data migration. Included are AWS CloudFormation templates that significantly reduce the complexity of setting up key the components, such as the HAQM Managed Streaming for Apache Kafka (HAQM MSK) cluster, HAQM Keyspaces, and the Apache Cassandra cluster, reducing manual configuration efforts. These templates, along with additional scripts for Apache Kafka Sink connectors, allow for the simultaneous insertion of data into HAQM Keyspaces and Apache Cassandra databases to facilitate a seamless migration process.
Note: [Disclaimer]
Architecture Diagram

[Architecture diagram description]
Step 1
The Apache Kafka console producer command line interface (CLI), hosted within an HAQM Elastic Compute Cloud (HAQM EC2) instance and functioning as a Kafka client, directly produces and publishes messages to a Kafka topic within the HAQM Managed Streaming for Kafka (HAQM MSK) service.
Step 2
The Kafka client sends the messages to HAQM MSK, which processes these messages, distributes, and replicates data across its brokers for reliable processing and fault tolerance.
Step 3
The HAQM MSK Connect feature, which is a component of the HAQM MSK service, ingests the data from the Kafka topic and writes it to both the Apache Cassandra and HAQM Keyspaces data stores.
Step 4a
The Apache Cassandra Sink connector, which is integrated within the HAQM MSK Connect feature, selects the routed messages and inserts them into the designated Apache Cassandra table.
Step 4b
Simultaneously, the HAQM MSK Connect feature routes the messages and inserts them into the HAQM Keyspaces data store. This is done through an HAQM Virtual Private Cloud (HAQM VPC) endpoint, which is powered by the AWS PrivateLink service.
Get Started

Deploy this Guidance
Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
HAQM MSK is a fully managed Apache Kafka service that automates complex administrative tasks like setup, scaling, and patching. By managing Kafka Connect connectors directly within the HAQM MSK service, operations are not only automated but also optimized for handling high-volume data streams with minimal downtime, supporting continuous improvement and operational resilience. Moreover, HAQM CloudWatch can be used to monitor the published metrics from HAQM MSK to quickly identify and troubleshoot any issues that arise. This monitoring capability allows for quick detection of anomalies and performance bottlenecks, making it easier to maintain system reliability and meet your service level agreements.
-
Security
By configuring a combination of PrivateLink, HAQM VPC, and HAQM VPC endpoints, a set of services is established that work in tandem to help ensure that all data transfers occur within the private AWS network. This setup minimizes potential attack vectors by keeping critical infrastructure off the public internet and restricting access to trusted entities only. Specifically, PrivateLink facilitates secure data transmission within AWS, while HAQM VPC helps facilitate both HAQM MSK clusters. Apache Cassandra HAQM EC2 instances operate in a secure, isolated network environment, accessible only through specific, controlled points. In addition, HAQM VPC endpoints for HAQM Keyspaces allow secure, private connectivity between those services, removing the need to use public URLs. Lastly, AWS Identity and Access Management (IAM) roles provide fine-grained access control so that only authorized users and systems can access specific AWS resources.
-
Reliability
HAQM MSK is a resilient streaming service, automatically managing data replication and failover within its Kafka brokers across multiple Availability Zones (AZs) for message handling. Additionally, HAQM Keyspaces enhances data availability through automatic three-way replication across three AZs within an AWS Region. HAQM EC2 instances hosting Apache Cassandra are deployed across private subnets within different AZs through HAQM VPC, distributing resources to mitigate risks from single points of failure. Lastly, PrivateLink specifically secures data transfers to HAQM Keyspaces for reliable and protected data flow without exposure to the public internet.
-
Performance Efficiency
HAQM Keyspaces is a service with managed, serverless database capabilities that automatically provide the capacity to match the demand of incoming writes from HAQM MSK, providing efficient processing without latency issues. This automation supports consistent performance even during high volumes of writes. HAQM Keyspaces also offers workload isolation at the table level so that the performance of one table is not affected by the workload of another table. This feature supports predictable performance across different tables by maintaining dedicated resources for each table.
-
Cost Optimization
HAQM MSK is a fully managed Kafka service that removes the need for manual provisioning and management of Kafka clusters, thus minimizing operational overhead and reducing resource waste. HAQM Keyspaces eliminates the need for you to invest in hardware upfront. You can offload essential operational tasks such as provisioning, patching, and managing servers, as well as installing, maintaining, and operating database software, to AWS.
-
Sustainability
With HAQM Keyspaces, you can choose on-demand or provisioned capacity mode so you can optimize the use of reads and writes based on your traffic patterns, preventing the over-provisioning of your resources. This efficient use of infrastructure conserves resources and reduces energy waste.
Related Content

[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running HAQM EC2 instances or using HAQM S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between HAQM or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.