The meteoric rise of ChatGPT has seen a sudden proliferation in AI-generated text that led to an important question: how could we detect if a blog post, book, or student’s paper was written by AI?
Researchers at the University of Maryland may have a solution, but companies would need to embed it into the large language models used by products like Microsoft’s Bing search engine or ChatGPT.
In a pre-published paper from late January, the researchers referred to their idea as a “watermark” that can be implemented “with negligible impact on text quality”.
The watermark is a clever way to leverage the underlying way language models generate text.
When you give something like ChatGPT a written prompt, it builds a response based on what is mostly likely to come next in the string, given the context.
What the researchers propose is that the system creates a list of ‘green’ words or letters (tokens) before each one is generated, and then give tokens on the ‘green’ list a higher probability of being generated.
The result is an AI-generated sample text that is still readable and coherent, but which can be easily identified by virtue of it being largely made up of ‘green’ tokens.
There are some issues with the proposal that need to be worked out, including that certain sequences will be hard to detect because they are so common.
One example in the paper is “the quick brown fox jumps over the lazy dog” – it would be strange for words to be swapped out, but just as likely that a computer and a person would fill the prompt in exactly the same way.
The researchers recognise that “the ability to detect and audit the usage of machine-generated text” is a “key principle of harm reduction for large language models” and acknowledge that more research is required.
“The watermark is computationally simple to verify without access to the underlying model, false positive detections are statistically improbable, and the watermark degrades gracefully under attack,” the paper says.
“Further, the proposed scheme can be retro-fitted to any existing model that generates text via sampling from a next token distribution, without retraining.”
OpenAI has produced a classifier for ChatGPT, but says it is “not fully reliable”.
Rather than a watermark solution, OpenAI is throwing more machine learning at the problem by training a language model on AI-generated text so it can spot it.
OpenAI said its classifier incorrectly identifies human-written text as being AI-generated about nine per cent of the time.
The rise of OpenAI’s ChatGPT quickly became a problem for the education system which scrambled to respond to the potential of AI plagiarism on an unprecedented level.
But it has also been an issue for independent publishers which have seen their open submissions for works swamped with AI text.
Neil Clarke, editor of science fiction magazine Clarkesworld, last month said he had to close submissions because the swarm of AI-generated science fiction stories was overwhelming.
“We don’t have a solution for the problem,” he tweeted. “We have some ideas for minimising it, but the problem isn’t going away.
“Detectors are unreliable. Pay-to-submit sacrifices too many legit authors. Print submissions are not viable for us.”