Discussion about this post

User's avatar
Lucas Wiman's avatar

The simplest task I've found that breaks every LLM is:

> Please multiply the following two numbers 45*58 using the "grade school" arithmetic algorithm, showing all steps including carries.

You can choose longer random numbers to increase the difficulty. I found that o1 could usually multiply five digit numbers but not six. 4o could multiply 2-digit or sometimes 3-digit numbers. Given that the number of steps in long multiplication is quadratic in the number of digits, that's a pretty big improvement!

Expand full comment
Rob Nelson's avatar

Your headline had me expecting that you had discovered quite a bit more than minor improvements to GPT's ability to solve language puzzles. The fact that the latest release has become slightly better at playing language games falls squarely in the "that's neat, but what does this really get us?" category, sort of like the new podcasting capabilities of NotebookLM.

I know you avoid scare quotes around "reason" when it comes to LLMs and I agree that semantic wrangling about our descriptive vocabulary doesn't get us anywhere. However, "doing fairly crude pattern-matching" and "quickly get bogged down" seems more like ordinary machine learning capabilities than an "extraordinary ability."

Expand full comment
40 more comments...

No posts