44 Comments

Thanks for the most definitive, balanced, and thorough article on the NYT vs. OpenAI case. I definitely learned a thing or two (but not 1,000 things, because at that scale, we're veering dangerously out of "fair use" territory).

Expand full comment

Do you know what prompts were used to have ChatGPT create some of the examples you gave? I am curious how the request was crafted. I am interested if the person creating the prompt was trying to get ChatGPT to generate the material to prove the point of copyright material being part of the machine learning or being cheap and trying to get around the paywall. I am researching GenAI and prompt engineering for a class and wanted to be able to show how the response was generated. Thank you for an excellent article.

Expand full comment

The MP3.com case was decided very quickly, but Texaco (9) and Google (10) cases took years to resolve. At the speed OpenAI is developing, will anything in the case against (more or less) GPT-3 be relevant against GPT-13?

Expand full comment

To me it seems simple. If it's publicly available, i.e. on a tweet, open website etc, it's fair game for AI or anyone else. They have the same measure of privacy that would be expected on a bulletin board at a coffeshop or town square, i.e. none. To claim since it's your writing, AI can't use it but anyone walking by can, makes zero since. Now if it's taking advantage of private material beyond a paywall, or not attributing sources, that's different. But I don't think what's what the lawsuit is about.

Expand full comment

This article changed my mind on the legal question. Great job!

Expand full comment

It seems like you could solve this with some creative training. For example, when you save copywritten training data replace every space or double space after a sentence with 4 half-spaces. Then either train or hardcode the AI to refuse the most likely token when the last token in the prompt is 4 half-spaces.

By continuously kicking it off track it should make it difficult to reproduce near exact copies. Also, not allowing the most likely token to be chosen should tend to make the output perform worse under any RLHF. That in turn should make the AI attempt to avoid exact quotes of copywritten material.

Indeed, it's likely that it will come to grasp that token strings containing 4 half-spaces are fundamentally different.

Now, there are likely problems with what I just suggested. It immediately comes to mind that quotations would be an exception. My point though is that this is just an off the cuff idea. It seems like a more serious investigation could solve this problem.

Expand full comment

This is a great, interesting article. Thank you. Funny that I read it today, because I used DALL-E last night to create an image of Pokemon playing pool at a bar. (To be a cover image used for a blog post.) I kept reprompting it to change the image to my liking. Some of the responses would list it as “…cute animated animals…” clearly trying to bypass using the word “Pokemon,” but some responses didn’t even bother disguising it. There were enough weird non-Pokemon animals featured, but play with it long enough, and it will spit out Pikachus.

Expand full comment

A lot of the times it seems like IP law cases are won by the attorneys that can holler "screw you" the loudest.

Expand full comment

As I've said, there is a root issue here - OpenAI et al. are arguing that AIs have - or should have - the same rights as people, that is, to read anything it pleases, and to recombine & repurpose that reading as it pleases, as long as they avoid producing close replicas of copyrighted works.

There is a big problem with this theory: AIs, unlike people, can be "owned" by a commercial entity, and as a result, must be considered to function as extensions of that commercial entity, rather than as independent agents.

If I had, for example, a personally-controlled AI - that is, one to which I and no commercial entity had access - I think I would be within my rights to read it any books from the library I pleased, or to show it any paintings at the museum I pleased, and to ask it to reproduce these in part or in whole, just as I could in theory memorize a written work or reproduce a painting for myself. I might even be within my rights to utilize a personal AI so trained for commercial purposes, or in the course of employment - it's not doing anything I couldn't do, with enough time, or wouldn't be allowed to do.

There are much greater restrictions on corporations & their agents than there are on persons, and I think these AI companies would be wise to steer clear of any arguments like theirs here that essentially rest on AI personhood...you're not allowed to own a person.

Expand full comment

This is a clear and balanced article. I personally read way better with my ears, and found this worth running through an AI narrator for easy listening. Let me know if this isn't something that you want to exist and want me to get rid of it.

https://askwhocastsai.substack.com/p/why-the-new-york-times-might-win?sd=pf

Expand full comment

Really nice article, and fun to bring up mp3.com. I originally thought you were going to talk about Aereo as well, but I had to look it up and that one wasn't about Fair Use for place shifting at all.

Expand full comment

Overall, a clear and balanced response to what I wrote. Well written. Well thought out.

Expand full comment

We’re putting out a note tomorrow that comes out somewhat differently than your piece. I’d love your thoughts when it’s out.

Expand full comment

Maybe not! I think the NYT and other entities suing under similar premises will ultimately lose. Ther ei sno difference between a person reading the NYT and then regurgitating what they read to friends and an AI "reading" the NYT and then providing answers to questions using that digested info.

------

OpenAI Seeks to Dismiss Parts of The New York Times’s Lawsuit

The artificial intelligence start-up argued that its online chatbot, ChatGPT, is not a substitute for a New York Times subscription.

By Cade Metz and Katie Robertson

Feb. 27, 2024

OpenAI filed a motion in federal court on Monday that seeks to dismiss some key elements of a lawsuit brought by The New York Times Company.

The Times sued OpenAI and its partner Microsoft on Dec. 27, accusing them of infringing on its copyrights by using millions of its articles to train A.I. technologies like the online chatbot ChatGPT. Chatbots now compete with the news outlet as a source of reliable information, the lawsuit said.

In the motion, filed in U.S. District Court for the Southern District of New York, the defendants argue that ChatGPT “is not in any way a substitute for a subscription to The New York Times.”

“In the real world, people do not use ChatGPT or any other OpenAI product for that purpose,” the filing said. “Nor could they. In the ordinary course, one cannot use ChatGPT to serve up Times articles at will.”

...

https://www.nytimes.com/2024/02/27/technology/openai-new-york-times-lawsuit.html

Expand full comment

One thought I have: literally everything entered into these needs to be classified as PII and the heaviest of the Mahler’s 6th hammers must be swung at anyone who violates the privacy of a user.

Expand full comment