The New York Times has launched a lawsuit against ChatGPT’s owner OpenAI and Microsoft over the use of the media giant’s content to train their generative artificial intelligence platforms.

The landmark lawsuit addresses one of the most significant issues to have emerged with the surge in popularity of generative artificial intelligence (genAI) over the last year, and could shape the future of the transformative technology.

OpenAI launched the ChatGPT generative artificial intelligence tool in November 2022.

By January the following year, it became the fastest-growing consumer software application in history with more than 100 million users.

Microsoft has invested more than $19 billion ($US13 billion) into OpenAI and has incorporated ChatGPT into its Bing search engine.

The New York Times filed a 69-page complaint in the Federal District Court in Manhattan against OpenAI and Microsoft, accusing the companies of operating business models based on “mass copyright infringement”.

The media titan alleges that OpenAI trained the large language model (LLM) using content produced by the New York Times, and that it then reproduces this content to users, making it effectively a competitor.

“Defendants’ unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service,” the New York Times lawsuit said.

“Defendants’ generative artificial intelligence tools rely on large language models that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.”

ChatGPT gave “particular emphasis” to work produced by the New York Times, the complaint alleges.

“Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment,” it said.

“Powered by LLMs containing copies of Times content, defendants’ genAI tools can generate output that recites Times content verbatim, closely summarises it and mimics its expressive style, as demonstrated by scores of examples.”

The New York Times approached Microsoft and OpenAI about these copyright issues in April last year in an attempt to find an “amicable resolution”, which may have involved a commercial agreement and “technological guardrails”.

But these discussions did not prove fruitful, and by August there were already rumblings of a potential lawsuit.

An OpenAI spokesperson said that the company was “moving forward constructively” in discussions with the New York Times, and that it was “surprised and disappointed” by the lawsuit.

“We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models,” the spokesperson said.

“We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.”

The complaint does not seek an exact monetary compensation but says that OpenAI and Microsoft should be held responsible for “billions of dollars in statutory and actual damages” due to the “unlawful copying and use of The Times’s uniquely valuable works”.

It also wants the companies to destroy any AI models that have been trained on its copyrighted material, which would include ChatGPT.

The New York Times has requested a jury trial in relation to the complaint, which OpenAI and Microsoft are yet to respond to.

The complaint includes more than 100 examples of ChatGPT reproducing New York Times articles practically word-for-word, effectively bypassing the company’s paywall.

One of these relates to a New York Times article titled ‘The Secrets Hamas knew about Israel’s Military’.

In response to a query, Bing’s generative AI platform copied all but two of its first 396 words.

Copyright has emerged as a major issue relating to the explosive growth of generative artificial intelligence products.

Last year it was revealed that a dataset of 183,000 pirated books, including ones by Australian authors, had been used to train AI models.

This led to a number of authors launching lawsuits last year.

Getty Images has also sued another generative AI tool that generates images based on prompts from users, over its use of Getty’s copyrighted photos.

OpenAI is reportedly negotiating with a number of media companies to reach commercial agreements over the use of their content.

According to The Information, OpenAI is offering as little as $1.5 million ($US1 million) to these media companies for the use of their content.

The federal government is also looking to tackle the issue and last month launched an AI copyright reference group, which will assist with future copyright challenges emerging from the increased use of generative AI.

“AI gives rise to a number of important copyright issues, including the material used to train AI models, transparency of inputs and outputs, the use of AI to create imitative works, and whether and when AI-generated works should receive copyright protection,” Attorney-General Mark Dreyfus said late last year.