Embers of autoregression in the latest o1 model
The latest OpenAI model isn't much better than GPT-4o at understanding images.
Last Tuesday, I asked Understanding AI readers to donate to GiveDirectly, a nonprofit that gives cash directly to some of the poorest people in the world. I’m thrilled to report that 78 of you stepped up and gave a total of $23,765—more than double my goal of $10,000. Today I will fulfill my pledge and give $5,000. Thank you so much to everyone who participated!
Back in September, OpenAI released two new models: o1-mini and o1-preview. These names hinted that a third model—neither “mini” nor a preview—would be out soon. OpenAI finally released that model to the public on Thursday. It’s simply called o1, and I’ve spent the last few days putting it through its paces.
When I tested o1-mini and o1-preview back in September, I came away impressed. The initial o1 models aced almost all of the text-based challenges I’d used on earlier models. However, I wasn’t able to test the performance of these models on images because OpenAI had not yet activated image capabilities.
With the release of the new o1 model on Thursday, I was finally able to test o1’s visual capabilities. And here the new o1 model shows little improvement over previous LLMs.
This underscores a point I made in my September piece: o1 dramatically improves performance in certain domains, like math and coding, where answers can be checked automatically. But in a lot of other domains, o1 represents an incremental improvement at best.
I also found that o1’s performance is sensitive to the way a problem is represented. For example, sometimes o1 is able to solve a problem when it’s described using words, but fails to solve the same problem if it’s presented as a diagram or photograph. This kind of brittleness could be a significant obstacle as people try to use these models to solve complex, real-world tasks.
Keep reading with a 7-day free trial
Subscribe to Understanding AI to keep reading this post and get 7 days of free access to the full post archives.