One post that got my attention on Hackernews a few weeks ago was this:
If you haven’t read it yet, it is about a software bug in the Boeing’s Dreamliner system that could lead to a power failure. The way to prevent this error from happening is to power cycle the airplane before it reaches 248 days powered on.
Yes, you read it right. It is the same correcting action we have been applying in our PCs, routers, printers and everything since the beginning of time. (January 1st, 1970).
Not a isolated problem. Happens with Boeing, Microsoft, Auto makers, etc. If you write any kind of software, can and will happen to you also. During my internship at a german automaker, one of my firsts assignment was to find the source of a long time running bug (it happened after a few hours) that made the vehicle’s system lose it’s real time synchronization. After a few weeks of debugging in the bench, with oscilloscopes and network analyzers I found the error: it was a integer overflow in a C Macro that was not being handled! Simple error, simple fix.
Even with software quality analysis, such as static checkers, tests and everything we are not safe from these kind of errors. As software eating the world, it is expected that this consequence will appear even more in our daily lives.
Every system has bugs. That’s a fact: a bug free system does not exists.
It is our job as software engineers, write the safest, correct and simple systems as possible. And have an established process to deal with the bugs that will be discovered. 🐛