Ubiquitous cameras plus AI will change how we understand the world
OpenAI's GPT-4V has an impressive ability to understand images.
Last week I wrote about OpenAI releasing GPT-4V, a new version of its flagship language model that has the ability to understand images. A few days later, Microsoft researchers released a fascinating paper exploring some of the model’s many new capabilities.
There was a lot to cover—so much that the report ran for 166 pages. It found that GPT-4V can “read” text, recognize landmarks and everyday objects, understand charts (with some difficulty), and solve visual puzzles.
GPT-4V also has an impressive ability to reason about cause and effect. For example, given a scrambled sequence of five images of someone making sushi, GPT-4V was not only able to recognize what the images showed, it was able to put them in the correct order:
In short, GPT-4V is able to reason about images with the same kind of flexibility ChatGPT shows when reasoning about text. And because OpenAI is selling it as a new capability of ChatGPT, I think a lot of people are going to think about it as mainly a new feature for the chatbot.
But that’s a mistake: the most important uses for image recognition technologies like GPT-4V isn’t chatting with individual users. It’s the ability to glean value from huge databases of images.
Get ready for cameras everywhere
Over the last week, I’ve been listening to Ashley Vance’s new book When the Heavens Went on Sale. Vance wrote my favorite biography of Elon Musk, so when I bought his new book I expected it to be a sequel focusing on the progress of SpaceX since 2015.
But Vance’s book turned out to be more interesting than that: it’s a book about a new crop of rocket and satellite companies that have sprung up in SpaceX’s wake. Companies like Rocket Lab and Astra are building smaller rockets designed to put objects into low-earth orbit—and to do it so cheaply that a single company can afford to fly hundreds of satellites.
One of the first companies to do this was Planet Lab, which began launching satellites in 2013. The company’s satellites photograph every square mile of the planet every day, and Planet sells its constantly updating image database to corporate and government customers.
Still other companies use this data, plus AI software, to provide clients with timely insights about the world. For example, several years ago a company called Orbital Insights figured out that the shadows on floating roof oil tanks could provide an estimate of how much oil was inside the tanks. By monitoring oil tanks around the world, the company could produce daily estimates of global oil inventories. Orbital Insights also estimates the sales of retail stores by counting the number of cars in their parking lots.
Orbital Insights began doing analysis like this long before the advent of GPT-4V. But models like GPT-4V will make satellite imagery even more valuable. More and more companies, non-profits, and maybe even individuals may be able to combine abundant satellite imagery with powerful image recognition models to gain new insights about the world. That, in turn, will boost demand for more satellite imagery.
And satellites are just one potential source of new images. More and more people are installing “smart doorbells” with cameras on them. In a decade or two, self-driving cars will be ubiquitous in major cities—and every single one will have several cameras on board.
Cameras are already common in factories and warehouses. Better image recognition software will let companies manage these facilities more efficiently—from tracking inventory levels to making sure workers wear safety equipment (and don’t slack off too much). Of course, factory workers might hate this, but the potential economic gains will make the trend difficult to resist.
I’m not going to make any specific predictions about how any of this might play out, but I’m pretty sure it’s going to be important. Until recently, computers needed human help to understand what was in an image. Now computers can glean a ton of actionable information directly from images. And that data can then become an input to other software.
Maybe we need a new batch of privacy legislation to sharply limit who can collect image data and how they can use it. Or maybe we’ll just have to get used to our every action being monitored and logged by AI software—at least when we’re out in public. But either way, it’s something we all need to start thinking about, because it’s going to have an impact on all of our lives.
A world where people live under total surveillance and the sky is blocked out by a million low earth orbit satellites ranks pretty damn high on the dystopian nightmare scale.
"Or maybe we’ll just have to get used to our every action being monitored and logged by AI software—at least when we’re out in public. "
Personally, I think the public is just going to get used to it. This is because well-meaning parents pushed for video cameras in schools to 'keep children safe'. As a result, kids are grew up knowing that they are under surveillance all the time (or at least often) and really don't care, since it is normal. While *I* may think ubiquitous cameras are a bad idea, the next generation of voters will likely shrug.