AWS Cloud Operations Blog
Scaling AWS Fault Injection Service Across Your Organization And Accounts
Welcome to part two of our series where we focus on scaling AWS Fault Injection Service (FIS) within your organization. In part one, we learned how customers can enable individual accounts within organizations by introducing a Service Control Policies (SCPs) rule to run network experiments when operating with a centralized networking infrastructure. In this blog, we will dive deeper into how organizations can use SCPs and IAM to enable application teams to run chaos experiments while adhering to security policies through a centralized strategy that enables controlled multi-account FIS experiments. This approach allows teams to systematically validate workload dependencies and resilience across different accounts and compliance domains.
Understanding AWS FIS Multi-Account Strategies
Multi-account support for AWS FIS experiments allows you to create and run experiments from an orchestrator account that injects faults into AWS resources in one or more target accounts. You can configure multi-account experiment templates and control their scope using IAM roles with fine-grained permissions and resource tags to specify each target. FIS provides multi-account visibility and safety, allowing you to review actions across all accounts from the FIS Console and audit API calls in each account with AWS CloudTrail. When you run a multi-account experiment, target accounts with affected resources will be notified via their AWS Health dashboards.
Multi-Account Experiment Strategies
Organizations can use two strategies for designing and conducting multi-account experiments:
- Centralized Management Strategy: In this strategy, an orchestrator account is created. This account is typically owned by a dedicated chaos engineering/SRE team, let’s call them the FIS Admins. The team is responsible for enabling configuration and management of experiments in the AWS FIS Console, as well as ensuring centralized logging of experiments. The orchestrator account owns the AWS FIS experiment templates and experiments. This approach allows the FIS Admins to collaborate with a decentralized developer organization distributed across multiple accounts.
- Decentralized Management Strategy: Each AWS account owner designs and runs their own experiments. This approach gives application owners the freedom to adopt chaos engineering within their own teams without the overhead of working with a centralized team. Organizations can implement additional guardrails with this model to prevent role modification or ensure FIS safety levers are utilized to prevent unwanted disruptions.
Note: If you plan to go for the decentralized approach, a trust relationship between accounts is needed, see part three for more guidance.
In the following example, we will guide you through the IAM policies and roles that are needed in the orchestrator and target account to enable your teams to run network experiments independently via centralized console. Our experiment has the objective to disrupt network connectivity to and from a specific subnet. For this scenario, we will create the AWS-FIS-Experiment-Executor role as described in part one. This role will have an AWS managed policy named AWSFaultInjectionSimulatorNetworkAccess attached to it, allowing it to perform the needed network actions, please see details on all permission here.
Multi-Account Scenario Preparation
For this scenario, we use two types of accounts:
- AWS-FIS-Experiment-Orchestrator-Account: This is the centralized account used to create, delete, or update FIS experiment templates and run experiments across all associated application accounts.
- Workload Account (Target Account): Where the actual workload resources reside and faults are being injected.
FIS Roles and Permissions
To ensure secure and controlled execution of fault injection experiments, AWS FIS uses a robust role-based access control system. We define two standardized roles: AWS-FIS-Experiment-Orchestrator and AWS-FIS-Experiment-Target. By implementing these roles, AWS FIS provides a framework for conducting controlled chaos engineering experiments while maintaining the necessary safeguards to prevent unintended disruptions throughout your development and production environment. Let’s take a deeper look:
- AWS-FIS-Experiment-Orchestrator: A role in centralized orchestrator account allowed to create, update, or delete FIS experiment templates along with permissions to execute experiments to inject faults into your target workload. This role has trust relationship to all target accounts.
- AWS-FIS-Experiment-Target: A role in the target account that contains permissions required to take action on resources. For the aws:network:disrupt-connectivity action the role will need ec2:CreateNetworkAcl + 9 others for example plus the mandatory tags. Read more on the actions here.
By leveraging roles, organizations can maintain robust security, ensure efficient governance, and adapt to changing needs as their cloud ecosystem evolves, all while upholding compliance and enabling seamless collaboration across multiple accounts. Let’s take a look at the sample diagram below:

Diagram A: AWS Account structure for FIS multi-account targets
AWS-FIS-Experiment-Orchestrator Permissions
Add following permissions to allow this role create, update, and delete experiments rights in the orchestrator account:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"fis:ListExperimentTemplates",
"fis:ListActions",
"fis:ListTargetResourceTypes",
"fis:ListExperiments",
"fis:GetTargetResourceType"
],
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "fis:*",
"Resource": [
"arn:aws:fis::<TARGET_ACCOUNT_ID>:action/*",
"arn:aws:fis::<TARGET_ACCOUNT_ID>:experiment/*",
"arn:aws:fis::<TARGET_ACCOUNT_ID>:experiment-template/*"
]
}
]
}
To execute experiments in target accounts, you need to grant the orchestrator role permissions to assume each target account role. See example below: (known as role-chaining):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": [
"arn:aws:iam::<<targetAccountID>>:role/AWS-FIS-Experiment-Target"
]
}
]
}
AWS-FIS-Experiment-Target
Add the appropriate AWS managed policy to the AWS-FIS-Experiment-Target role based on the experiment type. For example, use the AWSFaultInjectionSimulatorNetworkAccess policy for network disruption experiments. Note: Here you will add the orchestration account role create above to allow the across account permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "<<AWS-FIS-Experiment-Orchestrator-Account-ID>>:/root",
"Service": "fis.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringLike":{
"sts:ExternalId": "arn:aws:fis:region:<<AWS-FIS-Experiment-Orchestrator-Account-ID>>:experiment/*"
},
"ArnEquals": {
"aws:PrincipalArn": "arn:aws:iam::<<AWS-FIS-Experiment-Orchestrator-Account-ID>>::role/ AWS-FIS-Experiment-Orchestrator"
}
}
}
]
}
Creating a Multi-Account Experiment
To create a multi-account FIS experiment template via console, navigate to AWS FIS, from left side menu, choose Experiment templates options under Resilience testing. Continue creating experiment template using following steps:
-
- Choose the “Multiple accounts” option when creating the template.
- Specify actions and targets.
- Configure service access and specify the target role(s) for cross-account access.
- Choose the “Multiple accounts” option when creating the template.
Note: In the Target account confirmation is where you add the Target account role that has permissions to inject the network disruption.
4. Define logging, stop conditions, safety levers, and report configuration to ensure safe experiment execution.
CloudWatch Cross-Account Configuration
To enable cross-account CloudWatch monitoring:
-
-
- Create the AWSServiceRoleForCloudWatchCrossAccount role in the orchestrator account.
- Create the CloudWatch-CrossAccountSharingRole in each target account.
- Ensure the target role trusts the orchestrator account.
-
Stop Conditions, Safety Lever, Experiment Reports
AWS Fault Injection Service (AWS FIS) provides controls and guardrails for you to run experiments in controlled manner on AWS workloads. A stop condition is a mechanism to stop an experiment if it reaches a threshold that you define as an HAQM CloudWatch alarm. Safety levers are used to stop all running experiments and prevent new experiments from starting. You may want to use the safety lever to prevent FIS experiments during certain time periods or in response to application health alarms. Every AWS account has a safety lever per AWS Region. See Safety Levers for AWS FIS for details. Experiment reports are PDF summaries of the experiment action executed. The reports can be downloaded from the FIS console or sent to an S3 bucket specified in the experiment template.
Conclusion
Implementing a centralized multi-account strategy using the practices outlined in this blog offers:
-
-
- Enhanced security through role-based access control
- Improved governance with centralized experiment management
- Increased scalability for growing organizations
- Better compliance and audit capabilities across the organization chaos adoption
-
By adopting these best practices, you can create a robust framework for chaos engineering across your AWS environment. This approach allows you to systematically improve the resilience of your distributed systems, ultimately leading to more reliable and fault-tolerant applications. As you implement these strategies, consider regularly reviewing and updating your roles and permissions to align with your evolving organizational needs and AWS’s latest security recommendations. Remember that effective chaos engineering is an ongoing process, and these multi-account practices provide a solid foundation for continuous improvement in your systems’ reliability. Join us in part three where we dive into using a multi-account strategy with our AWS FIS Cross-Region: Connectivity Scenario.