A huge tech fail

We all want our information systems to be interconnected. We expect to withdraw money from an ATM in Paris, use our charge card to pay for a dinner in Tokyo, and get our boarding pass on our phone in Singapore. It’s all enabled by a technology and network infrastructure that connects computers worldwide. The convenience is unparalleled in human history, but there’s also a downside that we just experienced.

A faulty software update issued by the security company CrowdStrike has shut down businesses, hospitals, airlines, banks, and institutions around the world.  Experts will determine the exact cause and the steps needed to fix it, but we have to ask how it is that the tech industry has created a situation where one person performing one update on one piece of software can effect millions of people around the world? And if it’s so easy to do by mistake, imagine what can our adversaries can do.

From the New York Times, “The fallout, which was immediate and inescapable, highlighted the brittleness of global technology infrastructure. The world has become reliant on Microsoft and a handful of cybersecurity firms like CrowdStrike. So when a single flawed piece of software is released over the internet, it can almost instantly damage countless companies and organizations that depend on the technology as part of everyday business.”

This was just one bit of software that was instantly transmitted around the world – like a virus – causing billions of dollars in damages, canceled flights stranding people at airports, businesses unable to function, and surgeries put on hold.

This particular piece of software operates at the “kernel” level, deep within computers running Microsoft Windows. It has a critical influence over the operation of computers and their components. It functions at this level in order to better be able to detect cyberattacks, but it also means that it has a much bigger impact upon the computer and makes it much more difficult to fix.

Imagine, software designed to prevent outages creates the biggest outage of all time.

How could something even be designed to affect so many all at once when it goes awry? Why aren’t there redundancies, checks, failsafes, and backups built into its design? And wouldn’t you’d think an update would be rolled out slowly to a small number of users before going worldwide? Did anyone even think this through when it was designed? Was there a total failure to imagine these apocalyptic scenarios? Or even worse, was the company so frugal, it was not a priority?

Unfortunately, this is not unique, nor is it the first time we’ve seen a failure of imagination or frugality in technology creating unexpected results. Facebook said they never expected their product to be used for nefarious purposes; their intent was simply to bring people together. We know how that’s worked out.

These companies have very smart engineers that create very complex products, but are they so ignorant as to not anticipate what might go wrong?

Tech companies don’t put nearly as much effort to plan for their product’s misuse as they do in loading it with features. Perhaps it’s wishful thinking to not spend what’s needed to prevent these bad scenarios, because the more they imagine, the more work is needed to develop countermeasures, which eats into profits. Doing nothing is rewarded with fatter profits.

What’s really scary is we now have AI founders saying much the same about artificial intelligence. They tell us they will be wise and careful about employing AI technology and we should just trust them. In fact, a group of employees responsible for safety recently quit Chapt GPT because they had their resources cut. These AI companies are even pushing back at some very basic safety regulations being proposed by the government.

With hardware products, companies certify their products for safety and performance by testing and meeting FCC, UL and other regulatory agency requirements. We can at least see and touch the product and figure out its risks. Not so with software. It’s often too complicated for others to assess its risks and we are left to trust the companies.

Company size is also an issue. Many companies are so large and dominant, there’s little competition. When there’s a failure, it’s often massive, yet they rarely are penalized or suffer any long lasting effects. There’s no competitor to jump in.
In this case the CEO of CloudStrike has apologized, they will fix the problem, and business will proceed as usual, leaving billions of dollars in damages in its wake.

This is the tech world we live in today and it’s only going to get worst.