Understanding AI

Understanding AI

Share this post

Understanding AI
Understanding AI
The first copyright ruling on generative AI training is a win for AI labs

The first copyright ruling on generative AI training is a win for AI labs

New ruling provides a blueprint for AI companies to stay on the right side of the law.

Timothy B. Lee's avatar
Timothy B. Lee
Jun 24, 2025
∙ Paid
35

Share this post

Understanding AI
Understanding AI
The first copyright ruling on generative AI training is a win for AI labs
3
Share

I took a short break from Agent Week to write up a very important copyright ruling. Stay tuned for more Agent week content in the coming days.


On Monday, a California federal judge ruled that Anthropic “downloaded for free millions of copyrighted books in digital form from pirate sites on the internet.”

Normally, it would be bad news for a judge to write that about your company. But the ruling is actually good news for Anthropic—and even better news for the broader AI industry. That’s because—if it’s upheld on appeal—it will give AI companies a clear blueprint for training models without running afoul of copyright.

The plaintiffs are three authors who sued Anthropic last August, arguing that Anthropic had infringed copyright by training Claude using their books. It’s a class-action lawsuit seeking to represent thousands of authors whose books were included in the training data for Anthropic’s Claude models. Anthropic had asked the judge to rule that copyright’s fair use doctrine allowed it to train on these books.

“They wanted to get a knockout on fair use across the board,” Cornell legal scholar James Grimmelmann told me. Instead, the judge handed down a split decision: some aspects of Anthropic’s training were fair use, but others weren’t.

The part of the ruling that went against Anthropic is going to sting; Anthropic could wind up owing authors hundreds of millions of dollars for past copyright infringement.

But the other half of the ruling is far more important because it’s the first time a court has said it’s legal to train AI models using copyrighted content without permission from rights holders.

Piracy problems

Anthropic was founded by a group of former OpenAI researchers with deep connections to the academic AI research community. Traditionally, that community did not worry very much about copyright. And for good reason: not only does copyright law take a lenient attitude toward academic research generally, most early AI models had little commercial value.

So when Anthropic was preparing to train the first Claude model in 2021, it did what AI researchers had always done: download a bunch of training data from the Internet without worrying about its copyright status.

“In January or February of 2021, Anthropic cofounder Ben Mann downloaded Books3, an online library of 196,640 books that he knew had been assembled from unauthorized copies of copyrighted books—that is, pirated,” wrote Judge William Alsup in his Monday ruling. “In June 2021, Mann downloaded in this way at least five million copies of books from Library Genesis, or LibGen, which he knew had been pirated. And in July 2022, Anthropic likewise downloaded at least two million copies of books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated.”

Anthropic insists that all of this copying was legal because copyright law allows copyrighted works to be used for transformative purposes. For example, a 2015 ruling held that it was legal for Google to scan millions of in-copyright books for a book search engine. The appeals court in that case held that a search engine for books was a transformative use that didn’t compete with the books themselves—and hence was allowed under copyright’s fair use doctrine.

Anthropic argued that the same logic applies to its own training process because (like Google) it never distributed any books to users. But Judge Alsup was scathing about this argument.

“There is no decision holding… that pirating a book that could have been bought at a bookstore was reasonably necessary to writing a book review, conducting research on facts in a book, or creating an LLM,” he wrote. “Such piracy of otherwise available copies is inherently, irredeemably infringing, even if the pirated copies are immediately used for the transformative use and immediately discarded.”

So that’s the bad news, from Anthropic’s perspective. The case isn’t over; there is still going to be a trial, and Anthropic could try to convince a judge that this is all a big misunderstanding. But it seems likely that Anthropic is going to lose this part of the case and will owe money to thousands of book authors.

Grimmelmann told me that plaintiffs could be eligible for statutory damages that range from $750 to $30,000 per infringed work. With hundreds of thousands of works at issue, that could easily cost Anthropic hundreds of millions of dollars. It might even reach into the billions.

Creating a digital library was fair use

Photo by Alexander Spatari via Getty Images

Presumably, Anthropic would rather not pay authors hundreds of millions of dollars. But that could be a small price to pay for the other half of Alsup’s ruling, which clears a path for training AI models on copyrighted data in the future.

Keep reading with a 7-day free trial

Subscribe to Understanding AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Timothy B Lee
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share