The perils in 'black box' software

The use of ‘black box’ analytics software is creating risks for businesses and society, warns Cynthia Rudin, Professor of Computer Science, Electrical Engineering, and Statistical Science at Duke University.

Speaking at this week’s Institute of Analytics Professionals of Australia national conference, Professor Rudin warned the lack of transparency in many commercial ‘black box’ systems often renders them ineffectual and can create disastrous outcomes for those affected by the results.

“A black box-predicted model is a formula that's either too complicated to understand or it's proprietary, meaning that it's somebody's secret sauce, and they don't want you to know what it is,” explained Professor Rudin.

The risks are going to grow, she believes, as machine learning is applied to many ‘black box’ systems, “now people are starting to use machine learning for medical decision making, for loan decisions, for self-driving cars, and for all manner of other things that you really don't want to go wrong.”

Poor implementations of commercial ‘black box’ packages can cause great harm, she continued, citing the examples of criminal risk assessment software widely used and criticised in the United States which has seen wildly inconsistent evaluations of individuals’ likelihoods of re-offending.

One example she gave was of Glenn Rodriguez, a US convict who found he was being denied parole due to a typo in his criminal record.

“After the parole board hearing, he compared his scoresheet to someone else's, and he noticed that an error in his criminal history features and the model had up to 137 factors in it. So it wasn't that easy to spot an error,” Professor Rudin explained.

“This is not how the criminal justice system is supposed to work, right? Typos are not supposed to determine people's prison sentences.”

Coupled with inconsistent data – “I’ve never met a big dataset I can trust”, she says – ‘black box’ systems are often ineffectual and inaccurate.

“I’ve never met a big dataset I can trust," Professor Rudin told the IAPA national conference.

Much of the risk of can be overcome by using open systems with accountable and transparent algorithms, she believes, citing her own experience of working in predictive analytics for the energy industry.

As a case study of what can be done by well-implemented machine learning applied to open systems, Professor Rudin cited an algorithm used for evaluating the risk of brain seizures that improved patient outcomes and boosted doctors’ productivity.

Other presentations during the two day online conference were from author and analytics advisor Tom Davenport along with former Netflix VP, Gibson Biddle, and the IAPA Top 10 Analytics Leaders.

IAPA Managing Director, Annette Slunjski, said the conference was important in connecting the Australian analytics community after a difficult year.

“2020 has been a unique year, full of unknowns and uncertainty thanks to COVID,” she said.

“But uncertainty is where analytics stands tall. While some businesses and roles were downsized, many organisations invested to expand their analytics capability to get better insight when they needed it most.

“It was also important to support and connect the analytics community in a social distancing appropriate way – enter the virtual IAPA Conference.

“We created two mornings where our analytics community could be inspired by world-leaders like Tom Davenport, Cynthia Rudin, Dr Pamela Peele and Gibson Biddle; learn from the experiences of practitioners like Silvio Giorgio and Sveta Freidman; get a better understanding of the ethics and governance of data from Kay Firth-Butterfield and Dr Phillip Gould and finally to connect (virtually) with other analytics professionals to discuss key topics in analytics today.

“While analytics embraces technical skills and advanced technology, connections and communication in the community are beneficial for both the analytics professional and the work they do.”