Copyright lawsuits pose a serious threat to generative AI
Stable Diffusion and other AI systems are in uncharted legal waters.
I’m a journalist with a computer science master’s degree. In the past I’ve written for the Washington Post, Ars Technica, and other publications.
This is the first edition of my newsletter, Understanding AI. It will explore how AI works and how it’s changing the world. If you like it, please sign up to have future articles sent straight to your inbox.
The AI software Stable Diffusion has a remarkable ability to turn text into images. When I asked the software to draw “Mickey Mouse in front of a McDonalds sign,” for example, it generated the picture you see above.
Stable Diffusion can do this because it was trained with hundreds of millions of example images harvested from across the web. Some of these images were in the public domain or had been published under permissive licenses such as Creative Commons. Many others were not—and the world’s artists and photographers aren’t happy about it.
In January, three visual artists filed a class-action copyright lawsuit against Stability AI, the startup that created Stable Diffusion. In February, the image licensing giant Getty filed a lawsuit of its own.
“Stability AI has copied more than 12 million photographs from Getty Images’ collection, along with the associated captions and metadata, without permission from or compensation to Getty Images,” Getty wrote in its lawsuit.
Legal experts tell me that these are uncharted legal waters.
“I'm more unsettled than I've ever been about whether training is fair use in cases where AIs are producing outputs that could compete with the input they were trained on,” the Cornell legal scholar James Grimmelmann told me.
Generative AI is such a new technology that the courts have never ruled on its copyright implications. There are some strong arguments that copyright’s fair use doctrine allows Stability AI to use the images. But there are also strong arguments on the other side. There’s a real possibility that the courts could decide that Stability AI violated copyright law on a massive scale.
That would be a legal earthquake for this still nascent industry. Building cutting-edge generative AI would require getting licenses from thousands, perhaps even millions, of copyright holders. The process would likely be so slow and expensive that only a handful of large companies could afford to do it. Even then, the resulting models likely wouldn’t be as good. And smaller companies might be locked out of the industry altogether.
A “complex collage tool?”
The plaintiffs in the class-action lawsuit describe Stable Diffusion as a “complex collage tool” that contains “compressed copies” of its training images. If this were true, the case would be a slam dunk for the plaintiffs.
But experts say it’s not true. Erik Wallace, a computer scientist at the University of California, told me in a phone interview that the lawsuit had “technical inaccuracies” and was “stretching the truth a lot.” Wallace pointed out that Stable Diffusion is only a few gigabytes in size—far too small to contain compressed copies of all or even very many of its training images.
In reality, Stable Diffusion works by first converting a user’s prompt into a latent representation: a list of numbers summarizing the contents of the image. Just as you can identify a point on the earth’s surface based on its latitude and longitude, Stable Diffusion characterizes images based on their “coordinates” in the “picture space.” It then converts this latent representation into an image.
Let’s make this concrete by looking at an example from this excellent article about Stable Diffusion’s latent space:
If you ask Stable Diffusion to draw “a watercolor painting of a Golden Retriever at the beach,” it will produce a picture like the one in the upper-left corner of this image. To do this, it first converts the prompt to a corresponding latent representation—that is, a list of numbers summarizing the elements that are supposed to be in the picture. Maybe a positive value in the 17th position indicates a dog, a negative number in the 54th position represents a beach, a positive value in the 73rd position means a water-color painting, and so forth.
I just made those numbers up for illustrative purposes; the real latent representation is more complicated and not easy for humans to interpret. But in any event there will be a list of numbers that correspond to the prompt, and Stable Diffusion uses this latent representation to generate an image.
The pictures in the other three corners were also generated by Stable Diffusion using the following prompts:
Upper right: "a still life DSLR photo of a bowl of fruit"
Lower left: "the eiffel tower in the style of starry night”
Lower right: "an architectural sketch of a skyscraper”
The point of the six-by-six grid is to illustrate that Stable Diffusion’s latent space is continuous: the software can not only draw an image of a dog or a bowl of fruit, it can also draw images that are “in between” a dog and a bowl of fruit. The third picture on the top row, for example, depicts a slightly fruit-looking dog sitting on a blue dish.
Or look along the bottom row. As you move from left to right, the shape of the building gradually changes from the Eiffel Tower to a skyscraper, while the style changes from a Van Gogh painting to an architectural sketch.
The continuous nature of Stable Diffusion’s latent space enables the software to generate latent representations—and hence images—for concepts that were not in its training data. There probably wasn’t an “Eiffel Tower drawn in the style of ‘Starry Night’” image in Stable Diffusion’s training set. But there were many images of the Eiffel Tower and, separately, many images of “Starry Night.” Stable Diffusion learned from these images and was then able to produce an image that reflected both concepts.
Teaching Stable Diffusion to draw
How does Stable Diffusion learn to do this? A novice painter might go to an art museum and try to make exact copies of famous paintings. The first few attempts won’t be very good, but she’ll get a little better with each attempt. If she keeps at it long enough, she’ll master the styles and techniques of the paintings she copies.
The process for training an image-generation network like Stable Diffusion is similar, except that it happens at a vastly larger scale. The training process uses a pair of networks designed to first map an image into latent space and then reproduce the original image using only its latent representation.
Much like a novice painter, the system initially does a terrible job; the first images generated by the network will look like random noise. But after each image, the software grades itself on its success or failure and adjusts its parameters so it’ll do a slightly better job on the next image.
A key word here is slightly: each training image is only supposed to have a small influence on the network’s behavior. The network learns general features of dogs, beaches, watercolor paintings, and so on. But it’s not supposed to learn how to reconstruct any particular training image. Doing this is known as “overfitting,” and network designers work hard to avoid it.
This is important because copyright law protects creative expression but not facts about the world. You can copyright a particular painting of a dog, but you can’t copyright the fact that dogs have two eyes, four legs, a tail, and so forth. So a network that avoids overfitting will be on safer legal ground.
The case for fair use
In the mid-2000s, Google started scanning books in libraries to create a book search engine. Authors responded by suing both Google and its library partners for copyright infringement.
Google argued that its scanning was fair use, emphasizing that the scanned books would never be shown to users. In a pair of rulings in 2014 and 2015, an appeals court sided with Google and its library partners. “The result of a word search is different in purpose, character, expression, meaning, and message from the page (and the book) from which it is drawn,” the court held in its 2014 ruling.
Other copyright rulings have pointed in the same direction. In 2009, a different appeals court rejected a copyright lawsuit against the anti-plagiarism service TurnItIn. Students had sued arguing that the company infringed their copyrights by keeping copies of their essays without permission. The court disagreed, noting that Turnitin never published the students’ essays, and that the service wasn’t a substitute for the essays.
In short, the law provides a lot of leeway for what legal scholar Matthew Sag calls non-expressive uses of copyrighted works—uses where copyrighted works are only ever “read” by a computer program, not a human being.
Stability AI hasn’t responded to the lawsuits yet, but the experts I talked to expect the company to compare Stable Diffusion to services like Google Book Search and TurnItIn. It will likely point out that training images are only ever “viewed” by computer programs, not human beings. Some experts, including Sag, argue that this ought to be a winning argument for Stability AI.
I’m not so sure. As we’ve seen, a key assumption for a “non-expressive use” defense is that Stable Diffusion only learns uncopyrightable facts—not creative expression—from its training images. That’s mostly true. But it’s not entirely true. And the exceptions could greatly complicate Stability AI’s legal defense.
Stable Diffusion’s copying problem
Here’s one of the most awkward examples for Stability AI:
This example comes courtesy of this paper by researchers at Google and several universities. On the left is an image of Ann Graham Lotz, daughter of famed evangelist Billy Graham, from Stable Diffusion’s training data (it’s also on Lotz’s Wikipedia page). On the right is a picture of Lotz generated by Stable Diffusion. It’s not a perfect copy, but it clearly is a copy.
Stable Diffusion doesn’t generate direct copies like this very often. Researchers tried to reproduce 350,000 images from Stable Diffusion’s training set, but only succeeded with 109 of them—a success rate of 0.03 percent. And in ordinary use—with users who aren’t trying to get the software to reproduce training images—verbatim copies like this should be even less common.
Here’s another example from a recent research paper. These computer scientists used training image captions as Stable Diffusion prompts and found that they could generate an image similar to a training image 1.88 percent of the time. They used a much looser test for image similarity—loose enough that it’s not clear how many of these cases would trigger copyright liability. And again, ordinary users should see this much less frequently since unlike the researchers they’re not trying to produce copies of training images.
And here’s a funny example from Getty’s lawsuit against Stable Diffusion:
On the left is a Getty-owned image. On the right is an image generated by Stable Diffusion—complete with a distorted Getty watermark. The images under the watermark seem different enough not to raise copyright concerns. But reproducing the watermark does not look great for Stable Diffusion.
A final example is the Mickey Mouse image at the top of this article. It’s perfectly legal for Stable Diffusion to produce original photos of real people, since human faces don’t get copyright protection. But producing images of a cartoon character like Mickey Mouse—even wholly original images—may infringe Disney’s copyrights. Stable Diffusion also draws other animated characters like Batman, Superman, and Bart Simpson well enough to potentially infringe copyright.
Fair use is on shaky ground
The risk for Stability AI isn’t just that the owners of specific images or characters could sue for copyright infringement, as Getty has done already. The larger concern, Sag told me, is that these infringing outputs could “make the whole fair use defense unravel.”
The core question in fair use analysis is whether a new product acts as a substitute for the product being copied, or whether it “transforms” the old product into something new and distinctive. In the Google Books case, for example, the courts had no trouble finding that a book search engine was a new, transformative product that didn’t in any way compete with the books it was indexing.
Google wasn’t making new books. Stable Diffusion is creating new images. And while Google could guarantee that its search engine would never display more than three lines of text from any page in a book. Stability AI can’t make a similar promise. To the contrary, we know that Stable Diffusion occasionally generates near-perfect copies of images from its training data.
Beyond these specific legal arguments, Stability AI may find it has a vibes problem. The legal criteria for fair use are subjective and give judges some latitude in how to interpret them. And one factor that likely influences the thinking of judges is whether a defendant seems like a “good actor.” Google is a widely respected technology company that tends to win its copyright lawsuits. Edgier companies like Napster tend not to.
Stable Diffusion has a capability that seems likely to rub judges the wrong way: you can ask the software for an image “in the style of” any prominent artist—including artists who are still alive and making art. It’s not clear if copyright law protects this kind of “style.” But even copyright doves told me they understood why this feature of Stable Diffusion had many artists up in arms.
One of the most important factors judges consider in fair use analysis is the effect of a use on the market for the original work. Stability AI will undoubtedly argue that the overwhelming majority of the images Stable Diffusion generates are original enough that they won’t undermine the market for any particular image in its training set.
But it’s easy to see how Stable Diffusion could undermine the market for existing works in the aggregate.
“People are talking about how they'll use Stable Diffusion with other tools to put together high-quality brochures or presentations in ways that they would have licensed stock photos or commissioned illustration in the past,” James Grimmelmann told me.
If Stable Diffusion is able to generate new paintings “in the style of” a living artist, that is likely to depress demand for all of that artist’s past and future work. And Stable Diffusion is only able to do this because it was trained on the artist’s previous work—without paying the artist a dime. It’s easy to imagine a judge concluding that this tips the scales against a finding of fair use.
It’s not just Stable Diffusion
So far I’ve focused on Stable Diffusion, but the legal issues here extend beyond any one product or company. Stable Diffusion is an open-source product that has been incorporated into other image generation tools including Midjourney. Midjourney is also named as a defendant in the class action lawsuit against Stability AI.
OpenAI and Microsoft are also facing a lawsuit over GitHub Copilot, a code-completion AI that’s derived from OpenAI’s GPT-3. It’s probably only a matter of time before these companies face lawsuits for using copyrighted text to train ChatGPT and GPT-4. OpenAI’s DALL-E, Google’s Bard, and other generative AI systems may also be vulnerable to litigation if plaintiffs can show they were trained with copyrighted material.
If plaintiffs win—and based on my reporting, that seems like a real possibility—that would throw this nascent industry into chaos. Many—perhaps even most—companies offering generative image and language models could be forced to shut them down. Companies would scramble to assemble public domain and licensed data sets.
Large companies like Google, Microsoft, and Meta would have an inherent advantage here. Not only would they have the cash to sign licensing deals with major copyright holders like Getty, they may also be able to get permission to use user data to train their models.
I think the long-term result would be to further entrench these large tech companies. Some of them already have leading positions in this emerging technology thanks to heavy spending on research and development. But they face competition from rivals like Stability AI, a startup that managed to train Stable Diffusion for around $600,000.
But if the company loses these lawsuits, the cost of training a cutting-edge model will rise dramatically. It may become effectively impossible for new companies to compete with the incumbents to train new models. That won’t mean the end of AI startups—the big companies will likely license out their models for use by smaller companies. But it would represent a dramatic change in the structure of the industry.
Thanks to Amanda Levendowski, Mark Lemley, Pam Samuelson, and Vikash Sehwag for sharing their expertise with me for this article.
This is the first edition of Understanding AI, a newsletter that explores how AI works and how it’s changing the world. If you like it, please sign up to have future articles sent straight to your inbox.
The "almost exact copy" problem seems solvable? It should be possible to construct a similarity score between a generated image and any of the source images, based on some set of criteria.
Great article! I think the big entertainment companies are the 800lb gorilla, waiting in the wings. I wouldn't want to be on the other side of Disney's copyright lawyers. Disney also has the connections in government to lobby for favourable legislation. I think it's also a little telling that the entertainment companies haven't been making a big stink about it. Disney would love a tool that spits out an endless stream of Spider-man colouring books.