Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
This failure injection will simulate a critical problem with one of the three AWS Availability Zones (AZs) used by your service. AWS Availability Zones are powerful tools for helping build highly available applications. If an application is partitioned across AZs, companies are better isolated and protected from issues such as lightning strikes, tornadoes, earthquakes and more.
In Chaos Engineering we always start with a hypothesis. For this experiment the hypothesis is:
Hypothesis: If an entire Availability Zone dies, then availability will not be impacted
Go to the RDS Dashboard in the AWS Console at http://console.aws.amazon.com/rds and note which Availability Zone the AWS RDS primary DB instance is in.
To simulate failure of an AZ, select one of the Availability Zones used by your service (us-east-2a
, us-east-2b
, or us-east-2c
) as <az>
us-east-2c
us-east-2b
use your VPC ID as <vpc-id>
Select one (and only one) of the scripts/programs below. (choose the language that you setup your environment for).
Language | Command |
---|---|
Bash | ./fail_az.sh <az> <vpc-id> |
Python | python3 fail_az.py <vpc-id> <az> |
Java | java -jar app-resiliency-1.0.jar AZ <vpc-id> <az> |
C# | .\AppResiliency AZ <vpc-id> <az> |
PowerShell | .\fail_az.ps1 <az> <vpc-id> |
The specific output will vary based on the command used.
Watch how the service responds. Note how AWS systems help maintain service availability. Test if there is any non-availability, and if so then how long.
Refresh the service website several times
Verify your observations by going through the canary run data:
WebServersforResiliencyTesting
stackThis scenario is similar to the EC2 failure injection test because there is only one EC2 server per AZ in our architecture. Look at the same screens as you did before, for that test:
One difference from the EC2 failure test that you will observe is that auto scaling will not replace the EC2 instance in the same AZ as the one that was terminated. Auto scaling attempts to balance the requested three EC2 instances across the remaining two AZs.
This scenario is similar to a combination of the RDS failure injection along with EC2 failure injection. In addition to the EC2 related screens look at the Amazon RDS console, navigate to your DB screen and observe the following tabs:
This similarity between scenario 1 and the EC2 failure test, and between scenario 2 and the RDS failure test is illustrative of how an AZ failure impacts your system. The resources in that AZ will have no or limited availability. With the strong partitioning and isolation between Availability Zones however, resources in the other AZs continue to provide your service with needed functionality. Scenario 1 results in loss of the load balancer and web server capabilities in one AZ, while Scenario 2 adds to that the additional loss of the data tier. By ensuring that every tier of your system is in multiple AZs, you create a partitioned architecture resilient to failure.
Our hypothesis is confirmed:
Hypothesis: If an entire Availability Zone dies, then availability will not be impacted
This step is optional. To simulate the AZ returning to health do the following:
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool