200 - Build AWS Health Aware Operation Change Process

Authors

  • Jerry Chen, Well-Architected Geo Solutions Architect.

Contributors

  • Rich Boyd, Well-Architected Operational Excellence Pillar Lead.
  • Phong Le, Well-Architected Geo Solutions Architect.

Well-Architected Best Practices

This lab helps you to exercise the following Well-Architected Best Practices in your operation change process:

  • OPS07-BP05 - Make informed decisions to deploy systems and changes
  • OPS06-BP01 - Plan for unsuccessful change

Introduction

In the context of making production changes on AWS, whenever a failure occurs during the change window, your operation team have to check the root cause apart from potentially reverting the environment back to the last functioning version. They are often under huge pressure to conclude the root cause within the change window, so to make a Go or No-Go decision.

To accelerate the troubleshooting process, you need an approach to determine whether the change failure was caused by active AWS service events before proceeding to other aspects of application related investigations. You can achieve this goal by manually checking the AWS Health Dashboard, or open an AWS support case to engage AWS support engineers.

However there’s an opportunity to use AWS Health API to build an AWS health aware operation change process with AWS Systems Manager, so the operation change pipeline is capable of checking the health status of AWS services to ensure there’s no active AWS service events before kicking in the change execution, which avoids the change process being impacted by the service events.

Goals:

  • Build a Systems Manager automation runbook to encapsulate the automated change process.
  • Create a Change Template through AWS Systems Manager Change Manager to build an AWS health aware operation change process.
  • Simulate a service event scenario, and validate the change process is able to avoid the production change execution when there’s an active AWS service event.

Prerequisites:

Costs

NOTE: You will be billed for any applicable AWS resources used if you complete this lab that are not covered in the AWS Free Tier.

Steps: