Even as OpenAI continues in its self-proclaimed mission to build artificial general intelligence, a hypothetical AI system that would possess human-equivalent intelligence, the ChatGPT-maker remains embroiled in controversy regarding both the creation and use of its AI models.
Concerns of rampant copyright infringement have shadowed OpenAI and its peers since the launch of ChatGPT, resulting in a steadily growing list of lawsuits against the startup.
Related: Popular social media platform to sell user data to the company behind ChatGPT
OpenAI has regularly refuted the bulk of the claims brought against it, arguing that it is fair use to train its highly commercialized AI models on publicly accessible content without the permission, credit or compensation of the people responsible for creating that content. The U.S. Copyright Office has yet to weigh in on the issue.
At the same time, OpenAI is pursuing content licensing deals with publishers and platforms and has said that it would “be impossible to train today’s leading AI models without using copyrighted materials.”
OpenAI has not, however, denied that it has used copyrighted works in the training of its models.
Here’s a list of all the copyright lawsuits that have been filed against OpenAI.
Related: Human creativity persists in the era of generative AI
The Intercept, Raw Story and AlterNet
Filed Feb. 28 by the same law firm came two new copyright cases from a trio of news organizations: The Intercept, Raw Story and AlterNet.
Raw Story and AlterNet filed suit together.
Both suits claim that OpenAI violated the Digital Millennium Copyright Act (DMCA) of 1998, in which it is prohibited to remove copyright information — such as author and title — from an article with the intent of concealing copyright infringement.
Both suits, citing data from Copyleaks that found nearly 60% of ChatGPT responses to contain plagiarized content, state that OpenAI knowingly chose to strip away copyright information from the articles it used to train ChatGPT, essentially training ChatGPT “not to acknowledge or respect copyright, not to notify ChatGPT users when the responses they received were protected by journalists’ copyrights and not to provide attribution when using the works of human journalists.”
“Defendants had reason to know that ChatGPT would be less popular and would generate less revenue if users believed that ChatGPT responses violated third-party copyrights,” The Intercept’s suit says.
Read The Intercept’s complaint here.
Read Raw Story and AlterNet’s complaint here.
Related: Copyright expert predicts result of NY Times lawsuit against Microsoft, OpenAI
The New York Times
In the New York Times’ lawsuit, filed at the end of last year, DMCA violations are only one component of many.
The suit — which was also brought against Microsoft — alleges copyright infringement in both the input and output of OpenAI’s models, going further to allege unfair competition and trademark dilution.
“Defendants’ use of Times content encoded within models and live Times content processed by models produces outputs that usurp specific commercial opportunities of The Times,” the suit claims.
The Times is seeking to hold both firms accountable for “billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times’s uniquely valuable works.”
It is hard to dispute that, if generative AI companies were required to reveal their training data, there would be more lawsuits, and they would be more likely to be successful.
People are holding off suing because they can’t be certain what’s in the datasets.
Copyright law is…
— Ed Newton-Rex (@ednewtonrex) February 29, 2024
“If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission,” a Times spokesperson said at the time. “They have not done so.”
OpenAI, according to a Feb. 27 filing, is seeking to dismiss several parts of the lawsuit, including claims of direct copyright infringement, contributory infringement, copyright management information removal and unfair competition by misappropriation.
OpenAI said in the filing that the Times “paid someone to hack OpenAI’s products,” adding that ChatGPT “is not in any way a substitute for a subscription” to the Times.
“In the real world, people do not use ChatGPT or any other OpenAI product for that purpose. Nor could they,” OpenAI said.
Ian Crosby, the Times’ lead counsel, said in an emailed statement that the filing importantly does not dispute the company’s use of copyrighted material.
“What OpenAI bizarrely mischaracterizes as ‘hacking’ is simply using OpenAI’s products to look for evidence that they stole and reproduced The Times’ copyrighted works. And that is exactly what we found,” Crosby said. “In fact, the scale of OpenAI’s copying is much larger than the 100-plus examples set forth in the complaint.”
Read the Times’ lawsuit here.
Read OpenAI’s filing here.
Read the docket for the case here.
Related: OpenAI accuses New York Times of paying someone to hack ChatGPT
The Authors Guild of America
The Authors Guild of America in September of last year filed a class action suit against Microsoft and OpenAI on behalf of tens of thousands of authors — including George R.R. Martin, Michael Connelly, John Grisham and David Baldacci — alleging willful violation of copyright laws.
The suit claims that the two companies “reproduced and appropriated” the copyrighted work of tens of thousands of authors to “train their artificial intelligence models,” and further, that this violation of copyright law was done knowingly.
“Without Plaintiffs’ and the proposed class’ copyrighted works, Defendants would have a vastly different commercial product,” Rachel Geman, a partner with Lieff Cabraser and co-counsel for Plaintiffs and the Proposed Class said in a statement at the time. “Defendants’ decision to copy authors’ works, done without offering any choices or providing any compensation, threatens the role and livelihood of writers as a whole.”
More deep dives on AI:
Think tank director warns of the danger around ‘non-democratic tech leaders deciding the future’ George Carlin resurrected – without permission – by self-described ‘comedy AI’Artificial Intelligence is a sustainability nightmare — but it doesn’t have to be
The complaint seeks an injunction prohibiting the companies from violating authors’ copyright, in addition to actual damages and statutory damages of $150,000 per infringed work.
“Defendants copied Plaintiffs’ works and then fed them into their large language models, algorithms designed to output human-seeming text responses to users’ prompts and queries,” the suit says. “These algorithms are at the heart of Defendants’ massive commercial enterprise. And at the heart of these algorithms is systematic theft on a mass scale.”
Read the Authors Guild complaint here.
Read the docket for the case here.
Related: Senate Judiciary Committee seeks to build new framework to rein in Big Tech
Sarah Silverman and Paul Tremblay
Author Paul Tremblay brought a class action suit against OpenAI in June of 2023, similarly alleging rampant copyright infringement. His case was combined with an almost identical case, brought against OpenAI by Sarah Silverman.
Both complaints accuse OpenAI of copyright violations, violations of the DMCA, unfair competition, negligence and unjust enrichment, and seek damages in addition to permanent injunctive relief.
Tremblay’s complaint notes the existence and use of several “books” datasets in AI training, which include a total of hundreds of thousands of books. The suit also notes the fact that OpenAI has never included any information about the dataset used to train ChatGPT.
LLM “output is entirely and uniquely reliant on the material in its training dataset. Plaintiffs and Class members did not consent to the use of their copyrighted books as training material for ChatGPT. Nonetheless, their copyrighted materials were ingested and used to train ChatGPT,” the suit reads. “Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works — something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.”
OpenAI, the suit says, benefits commercially and profits “richly” from the use of plaintiffs’ copyrighted materials.
The court on Feb. 13 heard OpenAI’s request for dismissal and dismissed all but one of the plaintiffs’ claims: the unfair competition claim that OpenAI did not get the permission of the plaintiffs to use their work to generate profit.
Read Silverman’s complaint here.
Read Tremblay’s complaint here.
Read the docket for the case here.
Related: Big Tech clashes with Texas, Florida at Supreme Court
Copyright expert’s take
Copyright expert and Cornell professor of digital and information law James Grimmelmann told TheStreet in January that the core issue of each of these copyright cases centers around differing impressions of fair use.
“The AI companies are working in a mental space where putting things into technology blenders is always okay,” he said. “The media companies have never fully accepted that. They’ve always taken the view that ‘if you’re training or doing something with our works that generates value we should be entitled to part of it.'”
He said that both the Times’ case and the Authors Guild’s case are strong ones, adding that he is sure that both sides will be willing to negotiate.
“They want to be cut in on this. They want an arrangement where, if their works are being used for this valuable training, they get a cut of the royalties,” he said. “They have enough copyrights that they have something to offer. And they’re also professionals. They do media deals as part of how they stay in business.”
More deep dives on AI:
Deepfake porn: It’s not just about Taylor Swift
Cybersecurity expert says the next generation of identity theft is here: ‘Identity hijacking’
Deepfake program shows scary and destructive side of AI technology
This does not include the lawsuits that have been filed against other AI companies, such as Midjourney and Stability AI. It also does not include the non-copyright-related lawsuits filed against OpenAI, such as a recent class-action privacy suit filed Wednesday alleging unlawful and unfair business practices, negligence, invasion of privacy and unjust enrichment, among other things.
The suit, among other things, is seeking the permanent restraint of OpenAI from the conduct at issue in the complaint.
“Together with Defendants’ scraping of our digital footprints — comments, conversations we had online yesterday, as well as 15 years ago — Defendants now have enough information to create our digital clones, including the ability to replicate our voice and likeness and predict and manipulate our next move using the technology on which the Products were built,” the suit reads.
“They can also misappropriate our skill sets and encourage our own professional obsolescence. This would obliterate privacy as we know it.”
Contact Ian with tips and AI stories via email, [email protected], or Signal 732-804-1223.
Related: Marc Benioff and Sam Altman at odds over core values of tech companies