Humans do not store millions of previously encountered tokens. We use that raw data early on to painstakingly build word models (intuition).
Then, when we solve a problem, we do an iterative process of step-by-step work based on experience, with frequent validation against observed outcomes and world models.
Is this a theory or model supported by academic work, or just your intuition for how we "store" real-world data and apply it to problems? I'd be interested in researching the topic, if there are relevant papers!
Here's a paper that appears relevant: Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204.
Now, it is fair to say that brute-force methods like LLM can compensate for our inadequate modeling of the brain to an extent, the question is just how much one can push on context size as opposed to more architectural work.
Thank you! I'll check it out within the next few days. I'm of the opinion that insights into the brain's architecture will be fruitful for increasing the intelligence of LLMs in the coming years, even though I'm not excited about what a future with more powerful LLMs would look like.
Yes, my hunch (which I might write about more later in the week) is that AI models will eventually need the ability to create more complex data structures so they can build world models. I have no idea how to do this, but I think it's unlikely that a flat list of word vectors is ever going to get the job done.
It is likely that an LLM has some internal representations, but only as much as it can be inferred from text and images. So not very deep ones.
The industry seems to be heavily focused now on work-by-imitation, which has a fair chance of working if the AI agent can call tools for validation or generate code that queries simulations, which are world models we build for them.
Considering the failure of LLMs to answer simple questions about the physical world and other questions that are obvious to human intuition (see: SimpleBench), I think the answer isn't so clear.
Humans do not store millions of previously encountered tokens. We use that raw data early on to painstakingly build word models (intuition).
Then, when we solve a problem, we do an iterative process of step-by-step work based on experience, with frequent validation against observed outcomes and world models.
Is this a theory or model supported by academic work, or just your intuition for how we "store" real-world data and apply it to problems? I'd be interested in researching the topic, if there are relevant papers!
Here's a paper that appears relevant: Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204.
Now, it is fair to say that brute-force methods like LLM can compensate for our inadequate modeling of the brain to an extent, the question is just how much one can push on context size as opposed to more architectural work.
Thank you! I'll check it out within the next few days. I'm of the opinion that insights into the brain's architecture will be fruitful for increasing the intelligence of LLMs in the coming years, even though I'm not excited about what a future with more powerful LLMs would look like.
Yes, my hunch (which I might write about more later in the week) is that AI models will eventually need the ability to create more complex data structures so they can build world models. I have no idea how to do this, but I think it's unlikely that a flat list of word vectors is ever going to get the job done.
It is likely that an LLM has some internal representations, but only as much as it can be inferred from text and images. So not very deep ones.
The industry seems to be heavily focused now on work-by-imitation, which has a fair chance of working if the AI agent can call tools for validation or generate code that queries simulations, which are world models we build for them.
Perhaps something like Meta's recently published LCM (Large Concept Models) https://ai.meta.com/research/publications/large-concept-models-language-modeling-in-a-sentence-representation-space/ is something to explore further to try to advance the world model building.
Are we sure the word vectors aren't serving as powerful world models?
Considering the failure of LLMs to answer simple questions about the physical world and other questions that are obvious to human intuition (see: SimpleBench), I think the answer isn't so clear.