logoalt Hacker News

kojoruyesterday at 7:18 AM2 repliesview on HN

I'm wondering: how can understanding gradient descent help in building AI systems on top of LLMs? To mee it feels like the skills of building "AI" are almost orthogonal to skills of building on top of "AI"


Replies

joshdavhamyesterday at 7:31 AM

I take your point in that they are mostly orthogonal in practice, but with that being said, I think understanding how these AI's were created is still helpful.

For example, I believe that if we were to ask the average developer about why LLM's behave randomly, they would not be able to answer. This to me exposes a fundamental hole in their knowledge of AI. Obviously one shouldn't feel bad about not knowing the answer, but I think we'd benefit from understanding the basic mathematical and statistical underpinnings on these things.

show 1 reply
HarHarVeryFunnyyesterday at 3:32 PM

Sure, but it'd be similar to being a software developer and not understanding roughly what a compiler does. In a world full of neural network based technology, it'd be a bit lame for a technologist not to at least have a rudimentary understanding of how it works.

Nowadays, fine tuning LLMs is becoming quite mainstream, so even if you are not training neural nets of any kind from scratch, if you don't understand how gradients are used in the training (& fine tuning) process, then that is going to limit your ability to fully work with the technology.