94 Comments
Aug 2, 2023·edited Aug 2, 2023

Great explainer! Very interesting stuff.

I think I can help clear a couple things up. It's important to understand that these models *are NOT reasoning*; they're just doing math. When they are able to accomplish theory-of-mind-type tasks it's because the text they were trained on was written by humans with minds. E.g., GPT-2 didn't "figure out" that John giving a drink to John doesn't make sense; it has no idea what makes sense and what does not. Humans do, however, and they write accordingly. So what it's doing is mathematically determining, having been trained on text written by humans, that a human is very unlikely to arrange words in that way. It's far more likely that the other noun in the compound subject would be the recipient of the drink. Likewise with the mislabeled popcorn bag. It's able to do these things not because it "knows" anything and certainly not because it reasoned about the question but simply because of statistical probabilities. This is why the models improve so much with scale: because more examples, i.e. more data points, make the statistics more accurate. The exact process it goes through to get here is opaque to us but the _principle_ is clear and relatively simple, the more so because of your excellent explanation above. E.g., with the TiKZ unicorn, the model isn't "understanding" what a unicorn is but rather there are text descriptions of unicorns in the training data, along with descriptions of shapes and colors and how to draw things and how the drawing tools in TiKZ work, etc. and it has seen enough examples to bring those things together.

Emily Bender is exactly right when she calls these models "stochastic parrots." No amount of increasing complexity can ever turn a nonrational, purely deterministic, mathematical process into a rational understanding of truth. Thus, "hallucinations." These models are something like a cultural mirror: if, when we gaze into them, what we see looks human, it's because we are human. It is decidedly NOT because the mirror has spontaneously become human.

Expand full comment
Jul 27, 2023Liked by Sean Trott

Very good explainer. I also appreciated the last section that goes into a bit of philosophy and theories about how people learn.

Expand full comment
Jul 30, 2023Liked by Sean Trott

Thank you! Please continue with this type of analysis. I'd be curious to see a similar breakdown of what is back propagation and how does it work.

Expand full comment

Thank you so much Tim and Sean for writing this article - it’s the best overview I’ve read of how LLMs work and touches on some fascinating research areas.

It’s obvious to me now that we’re way past the point where the research into how these models work can keep up with the pace of development of the models themselves. This has me both excited and scared at the same time - a feeling I’m becoming very familiar with at the moment 🤓.

For me this means a couple of things:

- We definitely need to find ways to accelerate the research and I think we’re getting to the point where we need research AI models to help us understand AI models. I think we’ll see a big increase in developing specialised AI models to help us with all aspects of LLMs from research through to alignment.

- Philosophy and the Social Sciences are and will continue to become much more important in how we think, discuss and evaluate the future of LLMs and generative AI. This is almost the ‘top-down’ approach to understanding LLMs and when paired with more technical research probably gives us the best handle on how we should focus our energies around LLMs in the future.

Thank you again - really looking forward to your next article!

Expand full comment
Jul 30, 2023Liked by Sean Trott

Excellent and incisive explanatory piece. Great use of visual analogues to explain the power of computer science, as well as social science to elucidate consequences of these models.

Expand full comment

Great review! I was wondering if and how LLMs deal with punctuation? I would assume they could be assigned their own vector, but their meaning is far less concrete and operates primarily to *modify* the meaning of words and sentences. For example, there is a WORLD of difference between:

"Rachael Ray finds inspiration in cooking her family and her dog"

and

"Rachael Ray finds inspiration in cooking, her family, and her dog"

Any brief explanation of how punctuation and other language modifiers (accent marks for non-English language perhaps) works would be very welcome!

Expand full comment

Hey Tim, fantastic article on understanding large language models! Your use of visual analogies and insights into social science really make it accessible. Looking forward to more insightful breakdowns in the future! Keep up the great work!

Expand full comment
Jul 27, 2023Liked by Sean Trott

This explanation is wonderful, thank you!

Expand full comment

Great explainer, but you only spent one sentence on the down side: you don't know how it works. This logic is being used to weed out resumes, set insurance rates, etc. It is drastically impacting people's lives. It is being hailed as impartial, but actually is the opposite. It is replacing human decision-makers with third-rate reasoning skills. Written word is only a small part of human communications. Building an entire system using only that as input is like only eating the crust and thinking you ate a piece of apple pie.

Expand full comment

This may be the best Poetry 101 lesson I have encountered in years. All good poets know language vectors & how to both use them and topple them.

Expand full comment

Thank you for the gentle primer. For the first time, I’ve read something that gave me an explanation of the mechanisms behind it. Do you have any concerns about biased algorithms that can be introduced to (or maybe are already in) ChatGPT?

Expand full comment

Thanks for an awesome writeup. I have two questions:

1) Is the model "seeded" with certain rules in the Translators that are provided by humans? Like ... did *we* tell it what the difference is between a noun and a verb, or is that something that it learns from the training data itself?

2) How do the conclusions drawn from the Translators wind up back in the main vector space? Does the model need to know that "Warsaw" was correct to make an update to the "Poland" vector?

Expand full comment

Tim, great article! I’m wondering if we can translate your blog into Chinese and post it on AI community in China. We will highlight your name and keep the original link on the top of the translated version. Thank you.

Expand full comment

Thanks for this! Very good.

Given every step is observable there is no inherent mystery as to what these algorithms are doing. It's just impossible to map a process with states evolving wildly. It's so complicated it's enough to make we want to stick to stuff I can easily measure. God bless you guys!

It's such a whacky, evolving state transition matrix (of a sort) that, given a bunch of preceding words outputs another. I don't know much about it, but that's a safe place to start with, "AI": your common sense.

Given this is all about, "language" there is a grave error committed on this entire topic: use of the word, "intelligence". We are putting, "artificial" in front of it, but have we defined organic, "intelligence"?

What was my impulse to write this comment? How or why did I divert from what I (should) be doing to share these words? Why do I assert an opinion on AI when I am not a algorithm guy in the space? Is this, "intelligent" for me to do? Do I do it to share my, "intelligence" or to prompt response?

This is all and everywhere, "machine learning". Combining words is not, "intelligence". Consider for yourself how very limited language is in conveying sentiment. You have a sentiment to convey, but language is so very limited. This is one reason whey some people don't say much.

A LLM could have helped me tab-through this comment to type it more quickly than I did, and with fewer errors I'm prone to commit. It otherwise has no idea from whence, "intelligence" emanates.

Expand full comment

In "The feed-forward step", is the rightmost green neuron missing an arrow? It only has two

Expand full comment

Really excellent article, you made a densely complex topic understandable, thank you. But so many more questions! How the model weights probable outcomes based on learning from training data raises the question of quality of training data and inherent biases within that data. Fascinating.

Expand full comment