What excites me most about these new 4figure/second token models is that you can essentially do...

dvt • today at 12:09 AM • 2 replies • view on HN

What excites me most about these new 4figure/second token models is that you can essentially do multi-shot prompting (+ nudging) and the user doesn't even feel it, potentially fixing some of the weird hallucinatory/non-deterministic behavior we sometimes end up with.

Replies

volodia • today at 2:17 AM

That is also our view! We see Mercury 2 as enabling very fast iteration for agentic tasks. A single shot at a problem might be less accurate, but because the model has a shorter execution time, it enables users to iterate much more quickly.

lostmsu • today at 3:41 AM

Regular models are very fast if you do batch inference. GPT-OSS 20B gets close to 2k tok/s on a single 3090 at bs=64 (might be misremembering details here).

➕ show 1 reply

alt Hacker News

Replies