logoalt Hacker News

myrmidontoday at 11:20 AM1 replyview on HN

Agree completely with your position.

I do think though that lack of online learning is a bigger drawback than a lot of people believe, because it can often be hidden/obfuscated by training for the benchmarks, basically.

This becomes very visible when you compare performance on more specialized tasks that LLMs were not trained for specifically, e.g. playing games like Pokemon or Factorio: General purpose LLMs are lagging behind a lot in those compared to humans.

But it's only a matter of time until we solve this IMO.


Replies

ACCount37today at 11:51 AM

By now, I subscribe to "you're just training them wrong".

Pre-training a base model on text datasets teaches that model a lot, but it doesn't teach it to be good at agentic tasks and long horizon tasks.

Which is why there's a capability gap there - the gap companies have to overcome "in post" with things like RLVR.