Multiple component failures were to blame for a recent spate of outages at the Australian Taxation Office (ATO).
The ATO has released a detailed report into the massive IT systems outages it experienced in December 2016 and February 2017.
The 16-page report is based on findings from an ATO internal review, and on technical advice and a separate report prepared by consultancy PwC.
It says the main cause of the outages was “multiple component failures on the ATO’s primary Storage Area Network (SAN).”
The SAN was supplied by Hewlett-Packard Enterprise (HPE), the half of the old HP Corporation that looks after large systems and systems integration. HPE not only supplied the SAN, it operated it, in a managed services relationship.
PwC’s review found that the impact of the outage was compounded by the design and configuration the HPE SAN, saying there was an “over-emphasis on performance features rather than stability or resilience.”
Nevertheless, the ATO’s report is not critical of HPE, and even praises the company. In his foreword, the Commissioner of Taxation Chris Jordan says “HPE acted immediately, making available resources from across the globe. HPE staff cooperated and communicated openly with us, working tirelessly around the clock, including through the Christmas period, to ensure our systems and services were installed as quickly as possible.”
Quizzed on the matter at the Senate Estimates on 30 May, Mr Jordan said that the ATO had come to an agreement with HPE that "recoups key costs incurred by the ATO, and provides additional and higher grade IT equipment.”
He did not mention the amount the financial compensation paid by HPE to the ATO.
The report says that the ATO’s business continuity mechanisms, communications and engagement “worked well, but into the future need to be more inclusive of our partners.”
This seems to be at odds with what Mr Jordan said to the Senate Estimates hearings, where he admitted that “the recovery was slower because some of the actual recovery tools themselves required for that restoration were stored on the same SAN that failed.”
He went on to talk about the “heightened sense of responsibility and accountability” that the outages had caused.
The report identifies a number of areas for improvement, which it classifies into five ‘themes’:
- principles informing the ATO’s IT design
- correcting the identified faults
- enhancing the ATO’s capability to support infrastructure design and IT governance
- incident responses for the ATO and the wider tax system
- managing communication and business resumption with stakeholders.
The report says the ATO is “addressing each of these areas.” Each section of the report, based on these themes, contains a number of recommendations designed to improve processes if such outages occur in future.
As to compensation, the ATO says it has received only a “handful of claims” and that discretions and waivers were applied on occasion, when there are problems in dealing with businesses and individuals, and the outages were identified as the cause.
The Institute of Public Accountants (IPA) has called for compensation for accountants who could not work during the outage, but none has been forthcoming.
The report, which can be read in full here, contains five pages of detailed appendices outlining a timeline of the of outages and the ATO’s response at each stage.