We need to talk about facial recognition
Voluntary restraint by big tech companies isn't a sustainable solution.
GPT-4 was originally trained to understand both text and images. But when OpenAI officially released GPT-4 back in March, it was as a text-only product. The image recognition capabilities were held back for additional testing.
On Monday, OpenAI started to roll out a version of GPT-4 with image capabilities. Soon paying customers will be able to upload an image and then have a conversation with ChatGPT about it. OpenAI also released a “system card” detailing the risks created by image recognition and the steps the company is taking to mitigate them.
Some of these risks are variants of issues OpenAI has struggled with since the advent of ChatGPT. For example, if asked whether to hire a person shown in a photograph, early versions of GPT-4 would sometimes respond by invoking racial stereotypes. To prevent this, OpenAI says it has trained GPT-4 not to answer prompts asking for “ungrounded inference” about people.
OpenAI also encourages people not to use GPT-4’s vision capabilities for safety-critical purposes such as getting a medical diagnosis or identifying whether a particular mushroom is safe to eat. Analogous concerns apply with purely text-based chatbots.
But AI systems that understand images could create at least two problems that are genuinely novel. The first is related to CAPTCHAs, those annoying image-based puzzles you sometimes have to solve to access a website. If an AI system were able to solve these puzzles as well as a human being, it could open websites to a flood of spam, data scraping, and other unwelcome automated behavior.
The other big new potential problem is facial recognition. Humans are good at distinguishing among the faces of our friends and family. But until recently, there hasn’t been a way to recognize the faces of strangers on a society-wide scale. Powerful facial recognition is starting to change that, which could have huge social implications.
GPT-4 was never designed to be a search engine for faces, but OpenAI is nevertheless giving facial recognition a wide berth. Not only does the public version of GPT-4 decline to identify people in images—even famous people—the software will also refuse to answer questions about whether two images show the same person.
The dangers of facial recognition
I’ve been thinking about facial recognition a lot recently because I just listened to Your Face Belongs to Us, an excellent new book from New York Times reporter Kashmir Hill. The book tells the story of Clearview AI, a startup that built a search engine for faces that is now widely used by law enforcement. Hill revealed the company’s existence in 2020. Her book uses the Clearview story to explore the important policy issues raised by facial recognition technology.
Bad facial recognition technology has been around for many years, but Clearview’s technology is uncannily accurate. Upload a photo of a person to Clearview and it’ll show other photos of the same person, along with details about where each photo came from. These results may reveal the person’s name, which is the first step to figuring out their employer, home address, and other sensitive information.
Technology like this has obvious benefits. Police have used Clearview to help identify child molesters and bank robbers, for example. But it also has some real downsides—especially if it becomes available to the general public.
“This technology is exploding,” Hill told me in an interview. “It’s reaching a point where if we don't do something about it, make some rules, it's going to get away from us.”
Until recently, each of us could be confident that our identities would be unknown to strangers we encounter in public unless we make an affirmative choice to reveal them. Facial recognition technologies have the potential to flip that presumption.
For example, think of a young woman who meets a stranger at a bar. He asks her for her number. She politely declines. In a pre-AI age, that would be the end of the story. Once she left the bar, he’d have no idea who she was and no way to track her down.
But in the future, the man might be able to pull out his phone, snap a photo, and upload it to a facial recognition app. That might enable him to show up uninvited at her home or workplace the next day.
“There was a lot of stuff we could mine”
One of the big themes of Your Face Belongs to Us, is just how easy it has become to build software like this. Clearview AI was the brainchild of the Australian entrepreneur Hoan Ton-That. Before creating Clearview, Ton-That’s best-known product was called ViddyHo, an unsuccessful video sharing app whose viral marketing strategy was so spammy the New York Times labeled it a phishing scam in 2009.
By his own admission, Ton-That wasn’t an expert on facial recognition when he started building Clearview around 2017.
“I couldn’t have figured it all out myself,” Ton-That said. Luckily for him, a community of academic machine learning researchers had done a lot of the heavy lifting and published their findings for the world to see. “There was a lot of stuff we could mine,” Ton-That said.
Industry leaders recognized the potential for facial recognition long before Ton-That started working on the technology. But they made a conscious choice not to build products that could create severe privacy risks.
“We built that technology, and we withheld it,” Google chairman Eric Schmidt said at a technology conference in 2011.
Microsoft, Amazon, and Facebook all experimented with facial recognition technologies in the 2010s, but none of them built a Clearview-like product. In 2020, both Microsoft and Amazon announced that they would stop allowing law enforcement to use their face recognition technology. Facebook decided to shut down its facial recognition feature in 2021.
In short, Clearview’s success wasn’t due to a technological breakthrough. Ton-That just had fewer scruples than the leaders of major tech companies.
The importance of data
“Getting a facial recognition algorithm is pretty easy now,” Hill said. “What set Clearview apart is the database.”
Clearview scraped a wide range of websites containing images of ordinary people. Clearview claims to have a database of 30 billion images.
One early source of data was Venmo, which for years made people’s payment history public by default. Venmo would show a real time feed of payments being made by random Venmo users, including a photo of each user. By loading this feed over and over again, Clearview built a huge database with the names and photos of Venmo users.
Over the last five years, a lot of websites have tightened up their privacy rules and cracked down on scraping. Venmo shut down its public feed in 2021, for example. So it wouldn’t be easy for a newcomer to build a database as large as Clearview’s.
I think we may see a similar pattern in other parts of the AI industry. In the long run, data—not algorithms or computing power—may be the scarcest and most valuable input for training AI systems.
On Monday, Getty announced a new AI image generation tool. This is particularly significant because Getty is simultaneously suing Stability AI, creator of the image generation software Stable Diffusion. Getty argues Stability AI infringed copyright when it included Getty images in its training data.
Getty partnered with Nvidia to build its image generation tool. It advertises the product as “commercially safe”—a not-so-subtle reference to the copyright lawsuits Getty and others have filed.
The Verge’s Emilia David writes that photos generated by Getty’s tool “look better than expected” and “felt more human than when I tried the same prompt with Stable Diffusion.”
If Getty and other copyright holders win their lawsuits, then OpenAI and its rivals will need to scramble to license images and text so they can retrain their AI models. And Getty is one of the few companies with millions of high-quality images that can potentially be licensed in a single transaction.
But maybe Getty’s own AI offering will become popular enough that the company will see little reason to relinquish control over its intellectual property. In that case, AI might entrench the power of large content creators, like Getty, rather than disrupting them.
Facial recognition is a real AI safety problem
Lately I’ve been frustrated with the vagueness of the “AI safety” debate. A lot of people want stricter oversight of AI. But we’re far from a consensus about the nature of the threat—to say nothing of what to do about it.
But one AI-related harm that seems crystal clear is abuse of facial recognition. Face search engines will be a boon for stalkers and abusive partners. And currently, there are few if any legal restrictions on the technology at the federal level.
Right now Clearview is only available to law enforcement thanks to an Illinois law and savvy litigation by the American Civil Liberties Union. Illinois is one of the few states that regulate facial recognition, and the ACLU sued Clearview in 2020 arguing that its technology violated Illinois law. In a 2022 settlement, Clearview agreed to stop offering Clearview technology to individual customers not only in Illinois, but nationwide. So in effect, an Illinois law is helping to protect the privacy of all Americans.
But other startups have begun offering Clearview-like technology to the general public, and there’s no guarantee that the same litigation strategy will work against them. Ultimately, protecting the public against facial recognition technology nationwide is going to require federal legislation.
Writing a good law may not be easy. I’m an avid user of Google Photos, which offers a face recognition feature that only works within a single user’s photo library. I’d be sad if an overbroad law made this feature illegal.
Facial recognition technology also has huge potential to benefit blind people. OpenAI gave thousands of blind users early access to an image recognition app called “Be My AI” built using GPT-4. Reviews were generally positive, but blind users told OpenAI that “they want to use Be My AI to know the facial and visible characteristics of people they meet, people in social media posts, and even their own images—information that a sighted person can obtain simply by standing in any public space or looking in a mirror.”
“We hope someday to be able to find a way to empower the blind and low-vision community to identify people—just like sighted people do—while addressing concerns around privacy and bias,” OpenAI wrote.
So I think a total ban on private use of facial recognition technology would go too far. What’s needed is a law that prohibits the use of facial recognition technology to identify total strangers, while allowing people to use it to identify people they already know.
And while there are serious concerns with the abuse of facial recognition by law enforcement, I also see clear benefits to allowing law enforcement to use facial recognition technologies with appropriate safeguards. For example, we might want to require police departments to obtain a warrant before they’re allowed to search a database like Clearview. But I do want police officers looking for pedophiles or bank robbers to have access to facial recognition technology.
“One thing I want to be a takeaway from this book is that privacy laws can work,” Hill told me. “Europe basically kicked Clearview out. We can legislate the future we want.”
There is a clear and simple distinction between the use cases you are worried about — Clearview-style "who is this person?" — and the use cases you want to preserve for the general public — "a face recognition feature that only works within a single user’s photo library." It's the difference between identifying someone you don't know from a photo and identifying a photo of someone you already know.
This distinction leads to a technical basis for legislation. Facial recognition is safer when *the user brings the dataset* of faces to match against but very dangerous when *the app brings the dataset*. So a face-recognition law could prohibit recognition against an app-supplied database except by law enforcement with a warrant, but allow apps to perform recognition against a user-supplied dataset (perhaps limited to some reasonable size of number of images or number of distinct people the app can identify). This would allow companies to create sophisticated models that are good at face recognition in general, but which do not have specific identified faces embedded in them. Instead, the models could be fine-tuned on an individual user's photos. Apple, for example, does face recognition on-device. (https://machinelearning.apple.com/research/recognizing-people-photos)
Just on your early point re captchas, the writing has been on the wall for some time (also existing algorithms can already do quite a good job on captchas, not to mention services that just farm out the task to human workers).
Google has released v3 of their widely used ReCaptcha service and notably, it does not actually use captchas. It takes the algorithms they’ve built to determine whether someone is likely to be a bot from their browser session context (or something like that) which previously were used to decide whether a captcha was necessary or not, and makes that risk score alone into the entire product. It won’t serve you a captcha if you exceed the score threshold, it just won’t let you in at all.