9 Comments
User's avatar
Andy X Andersen's avatar

Humans do not store millions of previously encountered tokens. We use that raw data early on to painstakingly build word models (intuition).

Then, when we solve a problem, we do an iterative process of step-by-step work based on experience, with frequent validation against observed outcomes and world models.

Expand full comment
Ben's avatar

Is this a theory or model supported by academic work, or just your intuition for how we "store" real-world data and apply it to problems? I'd be interested in researching the topic, if there are relevant papers!

Expand full comment
Andy X Andersen's avatar

Here's a paper that appears relevant: Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204.

Now, it is fair to say that brute-force methods like LLM can compensate for our inadequate modeling of the brain to an extent, the question is just how much one can push on context size as opposed to more architectural work.

Expand full comment
Ben's avatar

Thank you! I'll check it out within the next few days. I'm of the opinion that insights into the brain's architecture will be fruitful for increasing the intelligence of LLMs in the coming years, even though I'm not excited about what a future with more powerful LLMs would look like.

Expand full comment
Timothy B. Lee's avatar

Yes, my hunch (which I might write about more later in the week) is that AI models will eventually need the ability to create more complex data structures so they can build world models. I have no idea how to do this, but I think it's unlikely that a flat list of word vectors is ever going to get the job done.

Expand full comment
Andy X Andersen's avatar

It is likely that an LLM has some internal representations, but only as much as it can be inferred from text and images. So not very deep ones.

The industry seems to be heavily focused now on work-by-imitation, which has a fair chance of working if the AI agent can call tools for validation or generate code that queries simulations, which are world models we build for them.

Expand full comment
Rafael Teixeira's avatar

Perhaps something like Meta's recently published LCM (Large Concept Models) https://ai.meta.com/research/publications/large-concept-models-language-modeling-in-a-sentence-representation-space/ is something to explore further to try to advance the world model building.

Expand full comment
Jim's avatar

Are we sure the word vectors aren't serving as powerful world models?

Expand full comment
Ben's avatar

Considering the failure of LLMs to answer simple questions about the physical world and other questions that are obvious to human intuition (see: SimpleBench), I think the answer isn't so clear.

Expand full comment