Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
Now that we have a readiness check for the application, we’re going to build the routing controls to manage the traffic to avoid cells that are not ready. To do this we’ll need to create a cluster. A cluster is a highly available set of 5 redundant regional end points hosting your routing controls. All the routing controls for the application will be hosted on one cluster. For disaster recovery, you can use a retry mechanism to cycle through each available regional endpoint to update the routing controls for your application.
Get started by creating a cluster, and giving it a meaningful name (e.g. UnicornCluster
), and accept the “Confirm pricing changes”, then click Create cluster:
Once created, you can see the 5 regional API endpoints (see the image below). In a DR scenario, you would use the Routing Control API, rather than the console, and ensure that your recovery logic retries all available endpoints. You also should ensure that your actions are restricted to the Route 53 Application Recovery data plane only.
Read more about Control planes and data planes in the documentation and this blog: Building highly resilient applications using Amazon Route 53 Application Recovery Controller
This will allow us to easily configure how traffic flows, either to our application or to a maintenance page, and then within the application, whether we’re routing traffic to both cells or diverting traffic away from a cell that’s not ready.
Using routing controls means we won’t be updating DNS records directly, which reduces the propensity for error, and ensures we’re using data plane rather than control plane functionality in a DR scenario.
Navigate to the DefaultControlPanel in your cluster, and click Add routing control:
Create 4 routing controls, with meaningful names (e.g. Maintenance
, Application
, CellEast
, and CellWest
), adding them to the existing control panel:
You should have 4 routing controls deployed before proceeding:
Route 53 Application Recovery Controller routes traffic using routing control health checks. For each routing control, create a health check by clicking on each routing control and clicking Create health check:
And then, give each a meaningful name (e.g. MaintenanceRoutingControl
, ApplicationRoutingControl
, CellEastRoutingControl
, and CellWestRoutingControl
). Leave the Invert health check tickbox un-checked, and click Create:
Repeat the process for all remaining routing controls.
If you click the Route 53 Health checks page in the navigation pane, you’ll see the states of these health checks. The ones we have just created will be in the Unhealthy state, as they have not been enabled:
When you work with several routing controls at the same time, you might want some safeguards in place when you enable and disable them. These help you to avoid initiating a failover when a replica is not ready, or unintended consequences like turning both routing controls off and stopping all traffic flow.
To create these safeguards, you create safety rules. For more information about safety rules, including usage examples, take a look at the documentation for Creating safety rules in Route 53 ARC.
For our application, we’re going to set up two simple safety rules. For the first, we’re going to assert that the ApplicationRoutingControl or the MaintenanceRoutingControl is active. This will ensure that we don’t inadvertently turn both off, leaving no route for our traffic.
Secondly, we’re going to assert that at least one of the MaintenanceRoutingControl, the CellEastRoutingControl, or the CellWestRoutingControl are active. This means that there will always be an endpoint active to receive the traffic.
(You will have noticed that this is not an exhaustive set of controls, because we have no assertion that the CellEastRoutingControl or CellWestRoutingControl is active when the ApplicationRoutingControl is active. You can extend the safety rules to cover these scenarios as a further exercise.)
In the Safety rules section of the DefaultControlPanel, click Add safety rule:
Click the Assertion rule radio button. An assertion rule enforces the criteria you set, or else does not allow the routing control states to be changed. A typical use for this type of rule is to prevent a fail-open scenario, which is what we’ll be configuring.
The first safety rule we will create asserts that either the Maintenance routing control or the Application routing control must be enabled. This is the first part of the fail-open prevention, ensuring that users will either be routed to the unicorn shop application, or otherwise the maintenance page, preventing both controls from being disabled at the same time.
Give your routing control a meaningful name (e.g. MaintenanceORApplication
), leave the Wait period at the default to prevent overly frequent state changes. Then, in the Routing control configuration section, select the Maintenance and Application routing controls. This sets the scope our safety rule to just these two routing controls:
Next, we’re going to configure the rule itself. In the Rule configuration section, click the Type pulldown and select Or. Enter 1 in the Threshold field, and click Create:
Then, we’re going to create a second safety rule to ensure that at least one endpoint is enabled. This will supplement the safety rule above to prevent a fail-open scenario where the East or West cells are offline as well as the Maintenance routing control.
We’ll create another assertion safety rule, but this time we’ll configure the rule as an At least assertion for the three endpoint controls. Go ahead and create another safety rule, ensure the Assertion rule radio button is selected, give it a meaningful name (e.g. AtLeastOneEndpoint
), and select the Maintenance, CellEast and CellWest routing controls. Select the At least option in the Type pulldown for the Rule configuration, and set the Threshold to 1, and click Create:
You should now have 4 routing controls and 2 safety rules set up for the DefaultControlPanel. The routing control state for all routing controls will be Off, which is a violation of the safety rule assertions. However, safety rules are evaluated on the future state of the routing controls when you attempt to update them.
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool