I don't think that's what these small models are for. They are for things like text summar...

gwerbin • yesterday at 10:48 PM • 2 replies • view on HN

I don't think that's what these small models are for. They are for things like text summarization and generating a title for your AI session. Maybe Haiku occupies a weird zone where it's overpowered for those tasks but underpowered for anything more sophisticated. But for example I used it on an agentic reasoning task recently (reading a chunk of information and drawing a written conclusion, not writing code) and it did just fine. More powerful model would have been a waste of money.

Replies

SwellJoe • yesterday at 11:01 PM

Sure, but it's priced higher than many better models. I'm not saying use the biggest models for everything. I'm saying Haiku is not a great deal as small models go. You can even self-host a model that is competitive if you've got a pretty beefy machine.

Haiku costs $1/$5. DeepSeek V4 Flash, a stronger model, is only $0.0028/$0.14/$0.28. That first number is the cached input, and DeepSeek caching is crazy efficient. So, using DeepSeek V4 Flash costs about an order of magnitude less than Haiku and performs better.

I have a Claude subscription because I'm willing to pay a premium for the best model for coding, one that doesn't waste as much of my time doing dumb stuff. But, if I need something other than Claude Code, I'm using something other than Claude models. Why burn money for no benefit?

Oh, also, Haiku chews tokens like crazy. In my benchmarks it used three times more tokens than the next highest model. Of course, security bug hunting is not in its wheelhouse, so it's not fair to judge it based on that one thing, but if it's more expensive per token and burns a lot more tokens, it ends up being a lot more expensive.

➕ show 1 reply

not_kurt_godel • yesterday at 10:55 PM

Haiku/Flash/small models are underpowered for literally anything where being non-false-positively correct on details matters at least like 25%. (That's not to say they are only correct 25% of the time, it's definitely more than that, but they're blatantly confidently wrong often enough that the wasted time is a significant net negative for me, even on relatively trivial tasks.)

alt Hacker News

Replies