It started out as a side project during his university studies, but Kieran Lindsay believes his automated Minerva research assistant could be a game-changer for academics struggling to keep up with the flood of publications – and an important step towards explainable AI.

A University of Technology Sydney (UTS) undergraduate and aspiring academic, Lindsay – who is nearing the end of a seven-year journey that also includes a Bachelor of Laws and Bachelor of Biomedical Science dual degree – spent eight months developing AcademicID (AcID), a platform that links research and researchers across a broad range of disciplines.

“I was motivated to build something that was going to help me throughout my career,” Lindsay told Information Age, “so I taught myself how to do a lot of this coding, and to get feedback from academics and others.”

Built using the Bubble.io codeless development tool and hosted on Microsoft Azure, Minerva – a core feature of AcID – taps the same OpenAI GPT-3 API used by ChatGPT, the natural language AI system that was introduced in November to showcase GPT-3’s natural-language capabilities.

In just a few short weeks, ChatGPT has sent academics to panic stations as experimentation sees the platform breezing through tasks ranging from passing intense US medical school entry exams to helping students cheat on assignments and even writing malware code.

The AI’s capabilities have kicked off a firestorm of debate as schools and universities scramble to prevent its use by students even as researchers consider how it might be used to streamline the process of information retrieval, analysis, and synthesis.

They’re walking a fine line: academics “want to save time,” Lindsay explained, “and all the ones I’ve spoken to since ChatGPT came out love the idea that they’ll be able to find information quickly.”

“But it’s not reliable enough to use in academia because it can spit out incorrect information – and that’s increasingly so as you get into more complex areas.”

Minerva lets students and researcher enter a plain-language query that uses GPT-3 to generate a plain-English precis of the issue they’re interested in.

Where it goes further, however, is by also feeding key parts of those responses into Semantic Scholar – an AI-powered database that allows keyword-based searching of more than 209 million academic articles.

By correlating the Semantic Scholar results with the output of the ChatGPT, Minerva provides a list of relevant resources that substantiate the content of the AI-written summary – saving users the effort of conducting their own literature review.

“By having the academic literature right there, it allows users to verify the information model much more quickly than poring over databases to do a separate literature review,” Lindsay explained.

“They can ask Minerva to explain a topic – and they get the explanation as well as a list of relevant academic papers. We’re leveraging the two different API providers to provide a better and more complete service based on what academics need.”

The challenges of explainability

Minerva is about more than just facilitating access to research: as the initial excitement over ChatGPT turns into impassionate examination of its limitations, it has become clear that the system’s well-documented systemic inaccuracies – many of which stem from the fact that its training data set means it is unaware of anything that happened after 2021 – mean that much of what it produces is demonstrably wrong.

OpenAI recently updated the model to improve its accuracy, but the ongoing issues with both accuracy and contemporaneity highlight the persisting risks of AI’s opacity, and the attribution issues that projects like Minerva are attempting to fix.

It’s a challenge long ago flagged by AI researchers who have raised concerns about assuming data can solve every problem accurately, instead pushing for ways to build ‘explainable AI’ – getting AI engines to annotate their work so human beings can judge its foundational assumptions.

Work from companies embracing AI has shown frequent mistakes ranging from lack of transparency around algorithms to a lack of understanding as to whether data is accurate, fair, or representative of ever-changing business models.

“When we start talking about AI, there are a lot of opportunities to augment processes [like bank mortgage approvals] and make it faster, better, and smarter,” Gartner senior director analyst Sumit Agarwal noted during the recent Gartner Data & Analytics Summit.

“But when it comes down to approval or rejection at the decision, this is where the AI stumbles: it doesn’t give you that good clarity on what is the decision, and the limitations in technology, and what can be done about that.”

“Don’t do explainability just because regulators are saying that they need to do it,” Agarwal said. “The whole goal is that you’re trying to create a better model – and we have to invest in making the predictions interpretable and consumable by whoever is the end user.”