Nearly a week after Optus’ major network outage which affected 10.2 million Australians and 400,000 businesses, the company has finally pinned down the cause – a routine software upgrade.
Last Wednesday, an unprecedented outage at telecommunications provider Optus left millions of Australians without mobile and internet services as the company struggled to restore its services over a 12 to 14 hour period.
For many customers, the outage served as a stark reminder of how essential telcos are to our daily lives – with impacts extending to businesses, banks, hospitals and even public transport.
Five days after the outage took place, Optus says it has finally identified the root cause.
In a statement issued late Monday afternoon, Optus described how its network was impacted by “changes to routing information from an international peering network” following a “routine software upgrade”.
Optus said at approximately 4:05am, these changes to routing information spread through multiple layers of its network and “exceeded preset safety levels” on some key routers – effectively disconnecting said routers from the Optus IP Core network.
“These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these,” said Optus.
“This resulted in those routers disconnecting from the Optus IP Core network to protect themselves.”
Optus further described some of its restoration processes, which it says required a “large-scale effort” involving physical reconnection and reboots for some routers.
“The restoration required a large-scale effort of the team and in some cases required Optus to reconnect or reboot routers physically, requiring the dispatch of people across a number of sites in Australia,” the telco said.
“This is why restoration was progressive over the afternoon.”
The company conceded investigations into the issue “took longer” than it would have liked, and further shared that it has made changes to its network to “address this issue so that it cannot occur again”.
When asked by Information Age what these changes involved, Optus did not immediately respond for comment.
Experts’ suspicions confirmed
Only hours after the outage began at 4:05am Wednesday, experts were quick to suggest software upgrades and configuration problems were to blame.
On 8 November, managing director of network testing facility Enex TestLab, Matt Tett told Guardian Australia he suspected the outage could be tied to a configuration issue and would likely require a router to be physically reconnected.
On the same day, internet infrastructure giant Cloudflare noted a slew of Border Gateway Protocol (BGP) announcements coming from an Optus-owned node around the same period the outage began.
BGP is effectively the routing protocol of the internet – serving to assign the most efficient routes for delivering internet traffic.
Cloudflare’s information led early users on popular discussions site Reddit to theorise the issue was either tied to a software upgrade or BGP and problems encountered during changes to routing information.
Despite early calls from experts and tech communities largely mirroring Optus’ eventual findings, Optus chief executive Kelly Bayer Rosmarin was at first sceptical of theories suggesting the outage was tied to software upgrade issues.
"It's highly unlikely, our systems are actually very stable," Rosmarin told ABC Radio Sydney last Wednesday morning.
"We provide great coverage to customers, this is a very, very rare occurrence.”
Optus said it is “committed to learning” from what has occurred, and is continuing to work with international vendors and partners to “increase the resilience” of its network.
Scammers target Optus customers
On 9 November, one after day the outage took place, the Australian Competition and Consumer Commission’s Scamwatch warned Australians of a scam offering false compensation to affected Optus customers.
The scam claimed to offer compensation from the company and encouraged recipients to click a malicious link with “myoptus” in the domain.
“Hi there, we apologise for yesterday’s network outage. We are offering compensation for all customers impacted,” the message read.
Meanwhile, after Rosmarin rebuffed widespread calls for compensation, Optus has offered 200GB of data for small businesses and consumers, and unlimited data on weekends for eligible prepaid customers until the end of the year.
The telco is facing both government investigation and a Senate inquiry following the outage, the latter of which will hold its first public hearings on Friday.
“We will also support and will fully cooperate with the reviews being undertaken by the Government and the Senate,” said Optus.