03 Dec The Role of Testing in Avoiding Failure
In “The Role of Testing in Fixing” blog post, the pattern below was introduced. Our Subaru had been starting poorly, the battery tested bad, was replaced, and we never lost the use of the car.
BAD → Test … Fix … Test → GOOD
Let’s go further in seeing the Value of Testing. As in the adage, “a stitch in time saves nine”, test can help avoid production outages.
Recently, my kids mentioned their Toyota Corolla steering wheel was shaking oddly, and the car was pulling dramatically to the RIGHT. I did the wrong thing and started worrying that it might be complicated, expensive stuff in the steering, or in the suspension, instead of just considering, you guessed it … what can we test? Googled it, no real help. Called my Dad who has fixed cars all his life. (Mechanical skills skipped a generation and has landed with my son and oldest daughter.) He recommended inspecting the right front tire, looking for any evidence of the steel belt going bad, and even swapping in the spare. (Pure gold reminder: Change ONE thing at a time and discover if you get different results.) After both efforts, nada.
Enter system failure. 8 AM Monday morning. “Dad, the LEFT front tire is flat!!” Things went from BAD to WORSE. Production outage!
Which is exactly my point. If we had pressed on in testing, or if necessary, taken it in for professional appraisal, we would have avoided the actual failure. Don’t do what I did with our car. Do the right thing with your software systems. Keep thinking and keep testing. Regarding thinking, Kepner Tregoe’s Problem Solving approach advises asking questions about the present situation which can lead to inquiries like:
- Where else could the problem be occurring but is not? (It could pull to the left)
- What do we know changes over time where the problem is occurring? (Tires wear out)
- What else could we test? (The left front tire)
If we had just swapped the spare with the LEFT front tire (who knew the steel belt going bad on the left tire would cause the car to VEER, not pull, to the right), we would have avoided the family production outage, which ended up requiring some car juggling for over a week.
In the world of business, we cannot tolerate system outages like this (think Black Friday or Cyber Monday), so if your system is acting cranky, and you’re not sure why. Don’t give up. Don’t panic. Don’t assume. Keep testing. Stay the course. Think, test. Think some more, test some more. You won’t be sorry.
This is the third blog in the 3-part series on “The Role of Testing in … Creating, Fixing, and Avoiding Failure”.