I take your point in that they are mostly orthogonal in practice, but with that being said, I think ...

joshdavham • yesterday at 7:31 AM • 1 reply • view on HN

I take your point in that they are mostly orthogonal in practice, but with that being said, I think understanding how these AI's were created is still helpful.

For example, I believe that if we were to ask the average developer about why LLM's behave randomly, they would not be able to answer. This to me exposes a fundamental hole in their knowledge of AI. Obviously one shouldn't feel bad about not knowing the answer, but I think we'd benefit from understanding the basic mathematical and statistical underpinnings on these things.

Replies

Al-Khwarizmi • yesterday at 10:52 AM

You can still understand that quite well without understanding backprop, though.

All you need is:

- Basic understanding of how a Markov chain can generate text (generating each word using corpus statistics on the previous few words).

- Understanding that you can then replace the Markov chain with a neural model which gives you more context length and more flexibility (words are now in a continuous space so you don't need to find literally the same words, you can exploit synonyms, similarity, etc., plus massive training data also helps).

- Finally, you add the instruction tuning (among all the plausible continuations the model could choose, teach it to prefer the ones human prefer - e.g. answering a question rather than continuing with a list of similar questions. You give the model cookies or slaps so it learns to prefer the answers humans prefer).

- But the core is still like in the Markov chain (generating each word using corpus statistics on the previous words).

I often give dissemination talks on LLMs to the general public and I have the feeling that with this mental model, you can basically know everything a lay user needs to know about how they work (you can explain things like hallucinations, stochastic nature, relevance of training data, relevance of instruction tuning, dispelling myths like "they always choose the most likely word", etc.) without any calculus at all; although of course this is subjective and maybe some people will think that explaining it in this way is heresy.

alt Hacker News

Replies