By now, I subscribe to "you're just training them wrong". Pre-training a base model...

ACCount37 • today at 11:51 AM • 0 replies • view on HN

By now, I subscribe to "you're just training them wrong".

Pre-training a base model on text datasets teaches that model a lot, but it doesn't teach it to be good at agentic tasks and long horizon tasks.

Which is why there's a capability gap there - the gap companies have to overcome "in post" with things like RLVR.

alt Hacker News