If the trick were genuinely useful, and was well circulated months ago, the resource-starved inferen...

geor9e • yesterday at 6:23 PM • 2 replies • view on HN

If the trick were genuinely useful, and was well circulated months ago, the resource-starved inference providers would have squeezed this trick dry already, instead of wasting 60% of their tokens, waiting for users to implement it themselves in 5 minutes of effort.

Replies

Klathmon • yesterday at 8:49 PM

That's like saying quantization isn't real because the frontier labs aren't using it in their production inference.

This is a lossy process, it produces worse results. It might be worth it for some situations, but applying it to everything would just be making your SOTA model worse

➕ show 1 reply

solenoid0937 • yesterday at 7:03 PM

[flagged]

alt Hacker News

Replies