Why large language models struggle with long…

Timothy B. Lee

Dec 17, 2024

111

Transformer-based LLMs get less efficient as context windows grow.

Read →

9 Comments

Andy X Andersen

Dec 17

Humans do not store millions of previously encountered tokens. We use that raw data early on to painstakingly build word models (intuition).

Then, when we solve a problem, we do an iterative process of step-by-step work based on experience, with frequent validation against observed outcomes and world models.

Expand full comment

Reply (2)

Ben

Dec 17

Is this a theory or model supported by academic work, or just your intuition for how we "store" real-world data and apply it to problems? I'd be interested in researching the topic, if there are relevant papers!

Expand full comment

Reply (1)

Andy X Andersen

Dec 17

Here's a paper that appears relevant: Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204.

Now, it is fair to say that brute-force methods like LLM can compensate for our inadequate modeling of the brain to an extent, the question is just how much one can push on context size as opposed to more architectural work.

Expand full comment

Reply (1)

Ben

Dec 18

Thank you! I'll check it out within the next few days. I'm of the opinion that insights into the brain's architecture will be fruitful for increasing the intelligence of LLMs in the coming years, even though I'm not excited about what a future with more powerful LLMs would look like.

Expand full comment

Timothy B. Lee

Dec 17

Yes, my hunch (which I might write about more later in the week) is that AI models will eventually need the ability to create more complex data structures so they can build world models. I have no idea how to do this, but I think it's unlikely that a flat list of word vectors is ever going to get the job done.

Expand full comment

Reply (3)

Andy X Andersen

Dec 17

It is likely that an LLM has some internal representations, but only as much as it can be inferred from text and images. So not very deep ones.

The industry seems to be heavily focused now on work-by-imitation, which has a fair chance of working if the AI agent can call tools for validation or generate code that queries simulations, which are world models we build for them.

Expand full comment

Rafael Teixeira

Dec 17

Perhaps something like Meta's recently published LCM (Large Concept Models) https://ai.meta.com/research/publications/large-concept-models-language-modeling-in-a-sentence-representation-space/ is something to explore further to try to advance the world model building.

Expand full comment

Jim

Dec 21

Are we sure the word vectors aren't serving as powerful world models?

Expand full comment

Reply (1)

Ben

Dec 22

Considering the failure of LLMs to answer simple questions about the physical world and other questions that are obvious to human intuition (see: SimpleBench), I think the answer isn't so clear.

Expand full comment

Understanding AI

Why large language models struggle with long…