Donald Trump may have fired the head of the US Copyright Office in protest, but he hasn’t changed the conclusions of a contentious report that concluded training generative AI (genAI) tools on copyrighted content is illegal – and that genAI firms must negotiate rights with content creators.
“Various uses of copyrighted works in AI training are likely to be transformative,” the recently released pre-publication version of the report – the third and final part in the Register of Copyright’s Copyright and Artificial Intelligence series – concluded.
“Several stages in the development of genAI involve using copyrighted works in ways that implicate owners’ exclusive rights,” the report said, but “the extent to which they are fair… will depend on what works were used, from what source, for what purpose, and with what controls on outputs.”
While genAI models built for purposes like analysis or research “are unlikely to substitute for expressive works in training,” the report found, genAI giants’ commercial models raise different issues because their consumption of copyrighted works is “unprecedented in scope and scale.”
OpenAI has previously claimed that genAI is “impossible” without using copyrighted content, arguing that such training qualifies as a ‘fair use’ exception under US copyright law – a contention that the US Copyright Office has now quashed.
“Making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries,” it found.
Stretching fair use past breaking point
It’s a stunning rebuff to an AI industry that has long taken an end-justifies-the-means approach to copyright, with legal experts warning of mass copyright violations ever since OpenAI’s ChatGPT burst onto the tech scene in late 2022.
Today’s Internet is flooded with automated bots collecting data to feed genAI platforms, with over half of all online traffic now generated by bots and so-called ‘grey bots’, including Anthropic’s ClaudeBot, TikTok’s Bytespider, DeepSeek’s DeepSeekBot and others.
And while Wikipedia’s Creative Commons licence means its content is considered fair game for scrapers – a practice it’s now discouraging by offering a smaller data set for AI developers – genAI’s love of YouTube videos and content publishers like The New York Times has been problematic.
Such practices are a “direct threat” to Australian creative industries, a Senate committee was told last year, months after the government established an AI copyright reference group – and Australian authors rebelled after finding genAI systems had been fed hoards of stolen novels.
Indeed, genAI firms haven’t been above using illegal means to accumulate training content – with employees of Meta, for one, caught training its genAI engine with the Library Genesis (LibGen) archive of 88 million copyrighted and paywalled books and scholarly works.
Where to from here?
Faced with the prospect of paying billions in licensing fees to increasingly agitated content creators – and what Meta called “massive administration problems” if genAI model creators have to find and negotiate with every content creator – AI giants are pushing back.
OpenAI, for its part, has proposed to US and UK authorities that copyright policy must “[preserve] American AI models’ ability to learn from copyrighted material…. America has so many AI startups, attracts so much investment…. largely because the fair use doctrine promotes AI development.”
Last month, Twitter co-founder Jack Dorsey’s call for authorities to “delete all IP law” was seconded by Musk, whose close ties with Trump have driven the US President to a position so sympathetic to AI giants that he fired US Copyright Office director Shira Perlmutter.
That move was decried by Committee on House Administration member Joseph Morelle as a “brazen, unprecedented power grab with no legal basis… less than a day after [Perlmutter] refused to rubber-stamp Elon Musk’s efforts to mine troves of copyrighted works to train AI models.”
Days later, two replacements from Trump’s Department of Justice – the president also fired longtime Librarian of Congress Carla Hayden, who appointed Perlmutter in 2020 – were denied entry to the Library of Congress, which houses the Copyright Office.
That office’s refusal to genuflect to the genAI industry – and assertions that “voluntary licensing may be workable” and that “collective licensing can play a significant role in AI training” – confirm that it prefers a wait-and-see approach that encourages organic growth over intervention.