One place I worked had a major release of their enterprise software and following the release a bug appeared. A serious bug. The business was not happy. The Director of Software Engineering was under pressure. The bug had never been reported before and the system processed a large volume of activity every day and had done so since the last major release six months ago. The Director was confident the bug was introduced with the new release. Because of the sense of urgency he didn’t wait for a full analysis. He made the decision to rollback to the prior version. But he was wrong. Rolling back didn’t fix the issue. The bug was pre-existing. It was unfortunate timing that the first time conditions were right to trigger the bug was the day after a major release.

People sometime refer to software as ‘hardening’ in Production. There’s a pervasive presumption that working Production software is generally safer and less risky than newly developed software. The idea is that the longer the software has been in use in Production without change the less likely it is to have unknown bugs. But hardening is a misnomer. Software doesn’t cure like concrete.

The assumption behind ‘hardening’ is that Production is the ultimate test. That the volume and variety of a Production environment is more comprehensive than any test regimen could be. But how good of a test is Production? What if 80% of typical Production activity exercises only 20% of the code? What happens when the unusual occurs? Many organizations test for load and scale issues but I don’t hear of organizations commonly measuring (or inferring) code coverage in Production. Could the unit tests and QA tests actually be more comprehensive than typical Production scenarios? Could ‘hardening’ represent an unsafe assumption?

A colleague pointed out there can also be an anthropomorphizing factor at work. Who do you trust more — the new recruit or the veteran? Even with known flaws the veteran can seem like a safer choice than the unknown recruit. “The devil you know is better than the devil you don’t.” But software releases are not people and you may not know the devil you know as well as you think you know it.

Program testing can be used to show the presence of bugs, but never to show their absence.

— Edsger W. Dijkstra

I once wrote the Dijkstra quote on a whiteboard as part of a presentation. The technical folk in the room nodded in recognition and agreement. The non-technical manager became agitated and vehemently disagreed.

To manage risk an organization working with the supposition that ‘hardened’ Production software is safer, may place strict controls on Production releases. Because of the effort involved in a release, releasing quarterly may be considered a fast schedule. Because of the amount of time between releases, each release tends to be large which increases the perceived risk.

If the potential for regression issues can be minimized (say through automated unit tests), which becomes the greater risk — introducing new bugs with a new release or not fixing latent bugs in the existing Production code? If the risk is in not releasing, than releases need to happen faster. Instead of quarterly what if releases were biweekly? The work would be chunked into smaller more frequent releases. Bugs which aren’t critical enough to require a special immediate fix could be fixed in production in 2 weeks instead of 3 months. The software could potentially adapt more quickly to changing needs and user feedback.

Build unit tests. Release often. The only way to improve software is to change it.