It's fascinating to think about the space of problems which are amenable to RL scaling of these...

mccoyb • today at 2:11 PM • 7 replies • view on HN

It's fascinating to think about the space of problems which are amenable to RL scaling of these probability distributions.

Before, we didn't have a fast (we had to rely on human cognition) way to try problems - even if the techniques and workflows were known by someone. Now, we've baked these patterns into probability distributions - anyone can access them with the correct "summoning spell". Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.

One question this raises to me is how these models are going to keep up with the expanding boundary of science. If RL is required to get expert behavior into the models, what happens when experts start pushing the boundary faster? In 2030, how is Anthropic going to keep Claude "up-to-date" without either (a) continual learning with a fixed model (expanding context windows? seems hard) or (b) continual training (expensive)?

Crazy times.

Replies

sosodev • today at 5:40 PM

My understanding, from listening/reading what top researchers are saying, is that model architectures in the near future are going to attempt to scale the context window dramatically. There's a generalized belief that in-context learning is quite powerful and that scaling the window might yield massive benefits for continual learning.

It doesn't seem that hard because recent open weight models have shown that the memory cost of the context window can be dramatically reduced via hybrid attention architectures. Qwen3-next, Qwen3.5, and Nemotron 3 Nano are all great examples. Nemotron 3 Nano can be run with a million token context window on consumer hardware.

➕ show 1 reply

Aerroon • today at 2:28 PM

A bit related: open weights models are basically time capsules. These models have a knowledge cut off point and essentially forever live in that time.

➕ show 3 replies

lxgr • today at 2:35 PM

Data sharing agreements permitting, today's inference runs can be tomorrow's training data. Presumably the models are good enough at labeling promising chains of thought already.

I could totally imagine "free" inference for researchers under the condition that the reasoning traces get to be used as future training data.

➕ show 3 replies

visarga • today at 4:28 PM

> In 2030, how is Anthropic going to keep Claude "up-to-date"

I think the majority of research, design and learning goes through LLMs and coding agents today, considering the large user base and usage it must be trillions of tokens per day. You can take a long research session or a series of them and apply hindsight - what idea above can be validated below? This creates a dense learning signal based on validation in real world with human in the loop and other tools, code & search.

andsoitis • today at 4:17 PM

> Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.

Part of it comes down to “knowing” what questions to ask.

➕ show 1 reply

baq • today at 4:45 PM

> In 2030, how is Anthropic going to keep Claude "up-to-date"

In 2030 Anthropic hopes Claude will keep Anthropic "up-to-date" on its progress on itself.

I'm only half joking here.

DeathArrow • today at 3:02 PM

They can use LORA.

alt Hacker News

Replies