Last month’s news of the impactful failure of a significant computer-based information/communication system was sadly not news at all – a Google search for “computer information communication system failure” yields 1,530,000,000 results!
Information and Communication Technology (ICT) system failures have a twofold distinction from failures of other complex, mission-critical engineering artefacts, such as airliners, bridges, nuclear power plants, etc: whereas failures of these other artefacts are rare, ICT failures are a sadly routine part of modern life.
And whereas failures of these other artefacts are big news that is not easily forgotten, news of computer-based information and communication systems failures fades away (perhaps overtaken by the next such).
For example, the spectacular Tacoma Narrows Bridge collapse in 1940 is notable, but how many other such collapses stand out?*
This twofold distinction is of course no accident: the frequency, indeed ubiquity of ICT systems failures leads to a psychological defence mechanism on the part of victims: denial of the problems or at least indifference to their consequences.
But the consequences of ICT failures are immense: for example, converting the dollar costs from these failures into life-saving investments in health care represents on conservative estimates many thousands of lives per annum.
Imagine what would happen if thousands of lives per annum were at risk in, say, aerospace/aeronautical engineering failures?
But wait a minute – they are!
And we have the means to manage this risk.
In Australia, we have the ATSB (Australian Transport Safety Bureau), in the UK there’s the AAIB (Air Accidents Investigation Branch), and in the USA the NTSB (National Transportation Safety Board).
As a result, and despite sometimes spectacular imagery, long-distance air travel is reputedly safer than road travel to and from the airport.
It’s evident that there are two distinct pathways to the establishment of reliable, well-engineered approaches to mission-critical systems.
The first (as perhaps evident in civil engineering) is the passage of millennia over which processes of natural selection lead to the gradual emergence of sound practice, not without occasional failures but (eventually) as exceptions rather than the rule.
The second (as in aerospace engineering) is a kind of “fast-forward” approach where the “learning from experience” that might otherwise take centuries is accelerated, dramatically so, by a social process of compulsory reflection on failures and imposition of remedial correction to “best practice”.
It is unarguable that ‘information engineering’ (our term for the approach to the development of computer-based information and communication systems that can be relied upon to deliver positive social and economic outcomes) can’t wait for hundreds or thousands of years to achieve the quality we expect from civil engineering.
Rather, it needs the accelerated approach that delivers results, daily if not hourly, for aerospace.
In short, as a first step, we call for those responsible for the expenditure of public monies on ICT, and/or approving the provision of critical social and economic services by organisations with their own computer-based information/communication systems, to introduce a systematic process of review of computer-based information and communication systems failures.
This would be analogous to that of the ATSB/AAIB/NTSB, with a view to identifying departures from best practice (especially in the development stage), and most importantly to accelerate the identification of best practice in what remains a relatively immature engineering discipline (if it can be called that at all, as far as current outcomes permit).
An “Information Engineering Safety Board” (IESB) with ATSB-style power and responsibilities, at least in the public sector, is long overdue.
To conclude, aerospace engineering and its ATSB/AAIB/NTSB manifestations have shown us how to “spin-up” a new engineering discipline within a short hundred or so years; the proposed IESB should be able to do so for information engineering.
Without it, or something much like it, we have hundreds of years (at least) of our current misery ahead.
We call on Commonwealth and State Auditors-General to exercise their independent powers of office to make our preferred vision an early reality.
We also call upon the key relevant professional societies (ACS and Engineers Australia) to lend us their support.
* For the period 2011-2020, Wikipedia records approximately a mere two bridge failures per annum worldwide that can be attributed to engineering errors (i.e. excluding extraneous factors such as tsunamis, landslides, reckless overloading, etc.).
Paul Bailes FACS, FIEAust, is Emeritus Professor of Computer Science at The University of Queensland with over four decades of experience in software development, research, teaching and consulting.
Christine Cornish FACS has over four decades of experience in software development and ICT program management and delivery for major public safety, health, education, finance, retail and resource projects in the public and private sectors.
Dr Nicholas Tate FACS CP, FBCS CITP, FRAS, is President of the Australian Computer Society (ACS) and has 50 years’ experience in Information and Communication Technologies.