Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
Understanding the health of your workload is an essential component of Operational Excellence. Defining metrics and thresholds, together with appropriate alerts will ensure that issues can be acknowledged and remediated within an appropriate timeframe.
In this section of the lab, you will simulate a performance issue within the API. Using Amazon CloudWatch synthetic, your API will utilize a canary monitor, which continuously checks API response time to detect an issue.
In this example, should the API take longer than 6 seconds to respond, an alert will be created, triggering a notification email.
The following resources had been deployed to perform these actions.
In this section, you will send multiple concurrent requests to the application, simulating a large surge of incoming traffic. This will overwhelm the API, which will gradually increase the response time of the application. This results in the canary monitoring exceeding the set threshold, triggering the CloudWatch Alarm to send notification.
Follow below steps to continue:
From the Cloud9 terminal, run the command shown below to change directory to the working script folder:
cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/scripts/
Confirm that you have the test.json
in the folder and it contains the following text:
{"Name":"Test User","Text":"This Message is a Test!"}
Go to CloudFormation console and take note of the OutputApplicationEndpoint value under Output tab of walab-ops-sample-application
stack. This is the DNS endpoint of the Application Load Balancer.
Execute the command below, replacing the ‘OutputApplicationEndpoint’ with the DNS endpoint value you recorded previously:
bash simulate_request.sh OutputApplicationEndpoint
This script uses the Apache Benchmark to send 60,000,000 requests, 3000 concurrent request at a time.
When you run the command you will see the output gradually change from a consistently successful 200 response to include 504 time-out responses.
The requests generated by the script are overwhelming the application API and result in occasional timeouts by your load balancer.
Keep the command running in the background as you proceed through the lab.
After approximately 6 minutes, you will see an alarm which is triggered as a response to the generated activity. This will trigger an email indicating that the CloudWatch alarm has been triggered.
Check and confirm the alarm by going to the CloudWatch console.
Click on the Alarms section on the left menu.
Click on the Alarms called mysecretword-canary-duration-alarm
, which should be in an alarm state.
Click on the alarm to display the CloudWatch metrics that the alarm data is based from.
The alarm is based on the Duration
metric data emitted by the mysecretword-canary
CloudWatch synthetic canary monitor. The Duration metric measures how long it takes for the canary requests to receive a response from the application.
The alarm is triggered whenever the value of the Duration
metric is above 6 seconds within a 1 minute duration.
On the left menu click on Synthetics and locate the canary monitor named mysecretword-canary
.
Click on the canary and the select the Configuration tab.
From here you will see the canary configuration and a snippet of the canary script.
In the canary script section, scroll down to the section that contains let requestOptionStep1
as shown in the screenshot below. This is the configuration that controls the destination of the request (hostname, path and payload body).
Click on the Monitoring tab.
From here you will see the visualization of the metrics that the canary monitor generates.
Locate the ‘Duration’ metric that is being used to trigger the CloudWatch alarm.
You will see the average duration value of the canary request representing the time to complete. A value above 6000ms signifies that the request has taken more than 6 seconds to receive a response from the application, indicating a performance issue in the API.
You have now completed the second section of the lab.
You should still have the simulate_request.sh
running in the background, simulating a large influx of traffic to your API. This causes the application to respond slowly and time-out periodically. The CloudWatch Alarm will be triggering and performance issue notifications sent to your System Operator to prompt them into action.
This concludes Section 2 of this lab. Click ‘Next step’ to continue to the next section of the lab where we will build an automated playbook to assist investigation of the issue.
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool