
As cloud computing continues to shape the modern business landscape, recent disruptions have demonstrated how fragile even the most advanced systems can be.
The Amazon Web Services outage left hundreds to thousands of services and companies dark. The lengthy list includes Fanduel, Lyft, InstaCart, Delta, Trello, Slack, Zoom, and HBO Max. They all scrambled to restore operations, exposing the risks of overreliance on a single provider.
The outage originated in AWS’s US-EAST-1 region (Northern Virginia), which is their largest and most used region. That meant many of AWS’s core services and major clients were hit.
The root cause was a Domain Name System (DNS) resolution failure that impacted the ability of services to even find the correct devices. When DNS fails, problems ripple far and fast. Because of AWS’s dominance (it powers a large portion of the Internet infrastructure), the impact was outsized: When AWS sneezes, the internet watches.
In a blog post, leaders at Livonia, Mich.-based STACK Cybersecurity said people should “think of DNS as the phone book of the internet.
“First used in the early 1980s, DNS represents interconnected servers that store registered domain names and Internet Protocol (IP) addresses,” they wrote. “DNS is the magic ingredient that allows users to interact with devices on the internet without having to remember long strings of numbers.”
The real lesson from the AWS outage is that “innovation and resilience must go hand in hand,” according to Rich Miller, Founder and CEO of STACK Cybersecurity, which provides outsourced IT solutions and cybersecurity to businesses across the country.
“We can’t control every failure, but we can control how well we prepare for one,” Miller said. “Strong contingency planning separates disruption from disaster.
“The AWS outage was more than a technical failure,” Miller added. “It was a leadership test. In an age where every company depends on digital infrastructure, resilience has to be part of the business model, not an afterthought.
Amazon said the outage of its cloud computing service was resolved Monday evening, after a problem disrupted internet use around the world, taking down a broad range of online services, including social media, gaming, food delivery, streaming and financial platforms.
A report from The Associated Press pointed out that the day-long disruption — and the frustration that came with it — served as the latest reminder that 21st century society is increasingly dependent on just a handful of companies for much of its internet technology, which seems to work reliably until it suddenly breaks down.
AWS is still the world’s largest cloud provider – though Microsoft and Google are picking up new business — and is hardly the first to suffer an outage. Moreover, it’s not easy for customers to jump ship, especially given the current capacity crunch at data centers, according to the AP report.
“The outage will likely fuel customers wanting to spread their infrastructure between multiple clouds, which could be a positive for smaller vendors like Google,” Bloomberg Intelligence analyst Anurag Rana told the AP. Still, he said, it’s unlikely to result in any meaningful market share loss for Amazon due to the difficulty of shifting work between clouds and industrywide capacity constraints.
Lee Clements, founding partner of Adaptive Data Networks, which delivers high-performance cloud solutions through scalable virtual machines, hybrid cloud integration, and dedicated infrastructure support, said companies have been “quick to attempt” to transform capital expenditures into operational expenditures, and remove supposed “operational complexities” by going cloud-native, since the launch of AWS and competing cloud platforms almost 20 years ago.
“While on the face, there’s nothing wrong with that approach, major cloud providers like Amazon, Microsoft, and Google are by their very nature extremely operationally complex — as proven by this most recent cascading failure of services that impacted large swaths of the internet,” Clements said. “ADN believes firmly in innovating quickly, while also maintaining reliable core services that our customers can rely on – customers that were not impacted by the most recent complexity-induced AWS outage.”
STACK Cybersecurity’s Miller cautioned business leaders not to treat things like backup plans as merely an expense to reduce. “Forward-thinking companies are now treating redundancy, backup planning, and cross-platform strategies as investments in continuity rather than costs to minimize,” Miller said. “The next disruption may be unpredictable, but the ability to recover quickly is entirely within reach for those who plan ahead.”




