AWS Cloud Operations Blog
Tag: Resilience
Scaling AWS Fault Injection Service Across Your Organization And Regions
In the first two parts of our series, we explored how to scale AWS Fault Injection Service (FIS) across AWS Organizations. Part one focused on implementing FIS in a single AWS account environment, introducing the concept of standardized IAM roles and Service Control Policies (SCPs) as guardrails for controlled chaos engineering experiments, particularly in centralized […]
Scaling AWS Fault Injection Service Across Your Organization And Accounts
Welcome to part two of our series where we focus on scaling AWS Fault Injection Service (FIS) within your organization. In part one, we learned how customers can enable individual accounts within organizations by introducing a Service Control Policies (SCPs) rule to run network experiments when operating with a centralized networking infrastructure. In this blog, […]
Scaling AWS Fault Injection Service Across Your Organization Using Account Controls
AWS Fault Injection Service (FIS) empowers you to adopt chaos engineering at scale within your AWS environment. Chaos engineering injects real-world, controlled failures into a system to verify resilience and reliability, ultimately improving the customer experience. This proactive, resilience-focused approach increases your confidence in a system’s ability to respond to adverse conditions in production. You […]
New AWS Fault Injection Service recovery action for zonal autoshift
We’re excited to announce that AWS Fault Injection Service (FIS) now supports a recovery action for HAQM Application Recovery Controller (ARC) zonal autoshift. With this integration, you can now perform more comprehensive testing by creating disruptive events and trigger a zonal autoshift as part of the same experiment. That way, you can observe how your application […]
Introducing AWS Fault Injection Service Actions to Inject Chaos in Lambda functions
Usage of serverless technology in regulated industries like financial services is growing. This growth demands robust resilience validation. Chaos engineering for Serverless has become crucial for ensuring reliable and available serverless applications. By purposefully injecting failures and stresses into serverless components, teams can uncover hidden weaknesses and validate the fault tolerance of their systems. Previously, […]
Strengthen application resilience with myApplications and AWS Resilience Hub
Introduction Today, organizations prioritize managing their applications over infrastructure, focusing on business outcomes while leveraging automation and cloud services to handle the underlying infrastructure. They seek to consolidate key application metrics like health, security, cost, and performance from AWS services such as AWS Security Hub or HAQM CloudWatch. These organizations also need to ensure their […]
Bootstrap your chaos engineering journey with AWS Fault Injection Service Scenarios Library
Ensuring the reliability and resilience of applications is crucial for maintaining business continuity, delivering a superior customer experience, and staying compliant with industry regulations. As defined in the AWS Well-Architected Framework Reliability Pillar, testing reliability plays an important role in ensuring reliability. Chaos engineering is a powerful way to not only test how your systems […]
How to perform Failover and Failback using AWS Elastic Disaster Recovery (AWS DRS) between VMware and AWS environments
Enterprises face a variety of threats such as natural disasters, cyber-attacks and technology failures that could severely disrupt operations. A comprehensive disaster recovery plan is crucial to quickly respond and recover from these events. In this blog post, we’ll show how to plan and implement a comprehensive disaster recovery solution between your VMware on-premises environment […]
Using Permissions to Unlock Resilience with AWS Resilience Hub
AWS customers come to AWS Resilience Hub for the ability to assess their application against their Recovery Time Objectives (RTO), the maximum acceptable time an application can be in a disrupted state, and Recovery Point Objectives (RPO), the maximum amount of data that can be lost due to disruption. Although customers come for the assessment […]
Resiliency Journey : exploring how AWS Resilience Hub and Migration Acceleration Program come together
In today’s rapidly evolving digital landscape, the cloud has become the backbone of innovation, scalability, and efficiency for businesses worldwide. As customers embark on their cloud migration journeys, whether the migration has been motivated by the intention of accelerating innovation, reducing operational and infrastructure costs, or exiting your on-prem datacenter, migrating to the cloud presents […]