logoalt Hacker News

SXXyesterday at 7:02 PM3 repliesview on HN

I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more.


Replies

unglaublichyesterday at 7:24 PM

Indeed, at 30tok/s make it pause for 20 seconds while "thinking" is streaming (and hidden); that's the real experience.

show 1 reply
sig_killyesterday at 10:23 PM

You should check out https://tokey.ai, I made it a few months ago and has all of these suggestions.

redox99yesterday at 8:30 PM

Yes, it should use actual output from some of the open models.