Understanding AI

Understanding AI

Share this post

Understanding AI
Understanding AI
Where frontier language models are today
Copy link
Facebook
Email
Notes
More

Where frontier language models are today

Five top US labs have had major new releases in the last two months.

Timothy B. Lee's avatar
Timothy B. Lee
Apr 17, 2025
∙ Paid
32

Share this post

Understanding AI
Understanding AI
Where frontier language models are today
Copy link
Facebook
Email
Notes
More
3
1
Share

Each time a frontier lab released a major new model in 2024, I put it through its paces and wrote about the experience. But in the new year I have not kept up with this practice. This is partly because companies have released so many new models. But it’s also because benchmarking is getting more difficult.

In early 2024, I could trick leading LLMs with simple brain teasers like “what weighs more a pound of bricks or two pounds of feathers?” Google’s Gemini 1.0 Ultra model, for example, claimed they weighed the same.

Today’s models are much smarter. They are rarely fooled by simple trick questions. Over the last year I’ve developed questions that fool newer models, but they’ve gotten increasingly baroque and disconnected from real-world use cases.

So I need to recalibrate my approach. Part of that will be talking to more people about how models perform in the real world. Another part will be writing about AI products (like Deep Research, Operator, and Claude Code) rather than stand-alone models.

Still, readers might appreciate a brief primer on the major model releases that have occurred in the two months since I wrote about o3-mini in February. OpenAI, Anthropic, Google, Meta, and xAI have all had major releases during this period. In this article I’ll briefly describe the strengths and weaknesses of each one.

Several new models from OpenAI

OpenAI executives presented the new o3 and o4-mini models in a Wednesday video. (OpenAI)

Ever since the release of ChatGPT in 2022, OpenAI has enjoyed narrative command over the chatbot market. OpenAI is to chatbots what Tesla is to electric cars and Apple is to smartphones. ChatGPT recently became the world’s most downloaded mobile app by some measures. And CEO Sam Altman recently hinted that ChatGPT could have as many as one billion weekly active users.

“There will be a lot of people with great models,” Altman said. “We will try to build the best product.”

As OpenAI works to cement its lead in the consumer chatbot market, it is also experimenting with multiple new models that push the state of the art in different directions:

Keep reading with a 7-day free trial

Subscribe to Understanding AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Timothy B Lee
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More