Model performance at inference in a data center v.s. stripping thinking tokens are effectively the s...

jryio • yesterday at 6:15 PM • 1 reply • view on HN

Model performance at inference in a data center v.s. stripping thinking tokens are effectively the same.

Sure they didn't change the GPUs their running, or the quantization, but if valuable information is removed leading to models performing worse, performance was degraded.

In the same way uptime doesn't care about the incident cause... if you're down you're down no one cares that it was 'technically DNS'.

Replies

sroussey • yesterday at 6:21 PM

I thought these days thinking tokens sent my the model (as opposed to used internally) were just for the users benefit. When you send the convo back you have to strip the thinking stuff for next turn. Or is that just local models?

alt Hacker News

Replies