If you have a decent understanding of how LLMs work (you put in basically every piece of text you ca...

lsy • yesterday at 7:42 PM • 10 replies • view on HN

If you have a decent understanding of how LLMs work (you put in basically every piece of text you can find, get a statistical machine that models text really well, then use contractors to train it to model text in conversational form), then you probably don't need to consume a big diet of ongoing output from PR people, bloggers, thought leaders, and internet rationalists. That seems likely to get you going down some millenarian path that's not helpful.

Despite the feeling that it's a fast-moving field, most of the differences in actual models over the last years are in degree and not kind, and the majority of ongoing work is in tooling and integrations, which you can probably keep up with as it seems useful for your work. Remembering that it's a model of text and is ungrounded goes a long way to discerning what kinds of work it's useful for (where verification of output is either straightforward or unnecessary), and what kinds of work it's not useful for.

Replies

crystal_revenge • yesterday at 8:16 PM

I strongly agree with this sentiment and found the blog's list of "high signal" to be more a list of "self-promoting" (some good people who I've interacted with a fair bit on there, but that list is more 'buzz' than insight).

I also have not experienced the post's claim that: "Generative AI has been the fastest moving technology I have seen in my lifetime." I can't speak for the author, but I've been in this field from when "SVMs are the new hotness and neural networks are a joke!" to the entire explosion of deep learning, and insane number of DL frameworks around the 20-teens, all within a decade (remember implementing restricted Boltzmann machines and pre-training?). Similarly I saw "don't use JS for anything other than enhancing the UX" to single page webapps being the standard in the same timeframe.

Unless someone's aim is to be on that list of "High signal" people, it's far better to just keep your head down until you actually need these solutions. As an example, I left webdev work around the time of backbone.js, one of the first attempts at front end MVC for single pages apps. Then the great React/Angular wars began, and I just ignored it. A decade later I was working with a webdev team and learned React in a few days, very glad I did not stress about "keeping up" during the period of non-stop changing. Another example is just 5 years ago everyone was trying to learn how to implement LSTMs from scratch... only to have that model essentially become obsolete with the rise of transformers.

Multiple times over my career I've learned lesson that moving fast is another way of saying immature. One would find more success learning about the GLM (or god forbid understanding to identify survival analysis problems) and all of it's still under appreciated uses for day-to-day problem solving (old does not imply obsolete) than learning the "prompt hack of the week".

thorum • yesterday at 8:07 PM

Beyond a basic understanding of how LLMs work, I find most LLM news fits into one of these categories:

- Someone made a slightly different tool for using LLMs (may or may not be useful depending on whether existing tools meet your needs)

- Someone made a model that is incrementally better at something, beating the previous state-of-the-art by a few % points on one benchmark or another (interesting to keep an eye on, but remember that this happens all the time and this new model will be outdated in a few months - probably no one will care about Kimi-K2 or GPT 4.1 by next January)

I think most people can comfortably ignore that kind of news and it wouldn’t matter.

On the other hand, some LLM news is:

- Someone figured out how to give a model entirely new capabilities.

Examples: RL and chain of thought. Coding agents that actually sort of work now. Computer Use. True end-to-end multimodal modals. Intelligent tool use.

Most people probably should be paying attention to those developments (and trying to look forward to what’s coming next). But the big capability leaps are rare and exciting enough that a cursory skim of HN posts with >500 points should keep you up-to-date.

I’d argue that, as with other tech skills, the best way to develop your understanding of LLMs and their capabilities is not through blogs or videos etc. It’s to build something. Experience for yourself what the tools are capable of, what does and doesn’t work, what is directly useful to your own work, etc.

➕ show 1 reply

godelski • today at 1:28 AM

To be honest, this even is mostly true in the research side of things. Granted, 99% of research has always been incremental (which is okay! Don't let Reviewer #2 put you off). Lots of papers are filled with fluff. That is, if you have a strong background understanding these systems (honestly, a math background goes a long way to genearlizing this as lots of papers are just "we tried this math idea" and if you already knew it, you'd have a good guess as its effects).

I think it is easy for it to feel like the field is moving fast while it actually isn't. But I learned a lesson where I basically lost a year when I had to take care of my partner. I thought I'd be way behind when coming back but really not much had changed.

I think gaining this perspective can help you "keep up". Even if you are having a hard time now, this might suggest that you just don't have enough depth yet. Which is perfectly okay! Just might encourage you to focus on different things so that you can keep up. You can't stay one step behind if you first don't know how to run. Or insert some other inspirational analogy here. The rush is in your head, not in reality.

nerdsniper • today at 4:03 AM

I have a very good idea of how various models work. But the business I run benefits immensely from utilizing the latest models, whether thats ultra low-latency YOLO-style models or “SOTA” high performing ViT, LLMs, etc.

I maintain a funnel sucking up all the PR stuff — but I skip straight to the papers, benchmarks, and githubs.

alphazard • yesterday at 8:44 PM

When explaining LLMs to people, often the high level architecture is what they find the most interesting. Not the transformer, but the token by token prediction strategy (autoregression), and not always choosing the most likely token, but a token proportional to its likelihood.

The minutiae of how next token prediction works is rarely appreciated by lay people. They don't care about dot products, or embeddings, or any of it. There's basically no advantage to explaining how that part works since most people won't understand, retain, or appreciate it.

bravesoul2 • today at 1:39 AM

Agreed. I played with a few code assistants and I dont see any stark differences in capability. Mostly UI. Do you want it in your editor, on the terminal, in the browser etc. It is because there is fierce competition everything hyped is quite good.

helloplanets • yesterday at 9:11 PM

It's not a model of text, though. It's a model of multiple types of data. Pretty much all modern models are multimodal.

panarchy • yesterday at 9:10 PM

AI research was so interesting pre-transformers it was starting to get a bit wild around GPT2 IIRC but now the signal to noise is so low with every internet sensationalist and dumb MBA jumping on the bandwagon.

victorbjorklund • yesterday at 10:22 PM

Indeed. We have just had a few really big shifts since launch of GPT3. Rest has just been bigger and more optimized models + tooling around the models.

qsort • yesterday at 8:11 PM

I agree, but with the caveat that it's probably a bad time to fall asleep at the wheel. I'm very much a "nothing ever happens" kind of guy, but I see a lot of people who aren't taking the time to actually understand how LLMs work, and I think that's a huge mistake.

Last week I showed some colleagues how to do some basic things with Claude Code and they were like "wow, I didn't even know this existed". Bro, what are you even doing.

There is definitely a lot of hype and the lunatics on Linkedin are having a blast, but to put it mildly I don't think it's a bad investment to experiment a bit with what's possible with the SOTA.

➕ show 3 replies

alt Hacker News

Replies