It's a start and I welcome competition but I don't think I ever used small cloud models li...

bel8 • yesterday at 8:13 PM • 14 replies • view on HN

It's a start and I welcome competition but I don't think I ever used small cloud models like Haiku 4.5. They are cute but for serious coding they tend to waste your expensive time.

And this certainly wont bring me back to GitHub Copilot which I cancelled yesterday.

GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs: https://www.reddit.com/r/GithubCopilot

I have since changed to DeekSeek Flash on high which is Sonnet+ level for almost free.

If I feel I still need smarter models I might signup for $20/mo Codex to use GPT 5.5 which, in my opinion, is the best I can access right now.

Replies

fnordpiglet • yesterday at 10:16 PM

I use larger models to organize work into a topologically sorted task graph and pin smaller models to the tasks depending on the complexity with a larger model evaluating the work and patching where necessary. This uses haiku quite often for routine work. I’m able to do multi hour highly complex work with superior results and a much lower bill as a result by doing this, with a parent orchestrator able to do a massive labor within a single context window by effectively organizing work and reviewing quality and integrating where needed. I don’t use haiku directly, but it’s often 30-40% of any major efforts token use. This further improves time to completion as well as cost - but I find haiku is better at following literal instructions and plans without “second guessing,” while opus class models second guess in their thinking constantly.

As such, haiku isn’t a waste of my time, it saves enormous amounts of time for me. But I spent a large amount of time building the orchestration system up front and iterating on it to get here. Interestingly i found my experience as a director and later a distinguished engineer gave me the tools to build it and get it working well and reliably end to end - the dynamics of multi agent workflows of varying capability is not a lot different than the dynamics of a 1000 engineer organization.

➕ show 2 replies

SwellJoe • yesterday at 10:30 PM

I've been doing benchmarking of various models for finding hard security bugs, and my faith in Haiku (and Sonnet, even) has dropped precipitously in the process. Self-hosted Qwen 3.6 27B consistently outperforms both for finding security bugs, which was a shocking result. I expected Qwen to be around Haiku level, maybe a little worse, and I definitely expected it to be worse than Sonnet.

And, DeepSeek and MiMo perform much better than Haiku and Sonnet, near Opus/GPT 5.5 levels, at a fraction of the cost.

There's seemingly no reason to ever use Haiku or Sonnet, if you're not getting it for free or as part of a subscription (that you don't usually saturate).

➕ show 1 reply

GaryBluto • yesterday at 8:34 PM

Almost exactly the same story here. I've also had little to no refusals from DeepSeek, with it's Chinese values meaning substantially less friction when it comes to things like reverse engineering, finding copyrighted files, working with dubiously-sourced source code, et cetera. I don't think I'd go back to Copilot even if they dropped prices by 90%.

➕ show 1 reply

lambda • yesterday at 9:32 PM

Yeah, seems like this is in the range of Qwen 3.6, Gemma 4, Nemotron 3 Super, and the like. There are lot of models, including much smaller cheaper ones (like Qwen 3.6 35B-A3B), that are similarly competitive with Haiku. I can run these on my laptop, I don't need to rent them from Microsoft.

I suppose if you're reeling at the new Copilot bill but want to stay in their ecosystem, this gives you something to use, but for most folks, there's a plethora of better options.

eli • yesterday at 11:38 PM

Makes sense as part of a larger coding workflow, especially if it’s fast. Using a trillion parameter model to figure out how to call a targeted edit tool or generate a commit message is a waste. Also narrow tasks like “make the background darker” or “rename this function and update callers”

nate • yesterday at 8:52 PM

The small stuff has their place. I have this safari extension and needed a way to quickly title people's chat histories. Haiku is the fast cheap thing to come up with decent titles of blocks of text. I feel like there's a bunch of those little things lying around you need a model for. I'm even finding Apple's Foundation Model is super useful for stuff like that. Even summarizing an article. It's like equally awful at doing it, but gets enough done to still be useful as a way to be like "oh yeah, this article is actually worth reading"

➕ show 1 reply

hparadiz • yesterday at 8:46 PM

The $20/month ChatGPT plan that comes with codex is good value. Even just have premium ChatGPT is nice. I get rate limited regularly but it still lets me do most things.

➕ show 1 reply

bbstats • yesterday at 11:38 PM

What application/UI are you using deep seek flash high on? Still copilot or something else

vidarh • yesterday at 10:09 PM

Haiku does quite well if given a detailed plan. That means much more detail than you otherwise would, but you can still save over e.g. having Opus or Sonnet do everything by having them expand their initial plans into more specific levels of detail and feed it to Haiku (or similar level models).

I personally wouldn't use models that class directly, though - I'd use them in a harness as a "backend" for more capable models. And Haiku itself, as opposed to other smaller models, is still expensive.

alkonaut • yesterday at 8:51 PM

Won’t (presumably) all the market actors converge on similar pricing? If OpenAI stopped operating on subsidies and charge the true costs and their most token hungry customers are the ones that switch to Anthropic and others, then their pricing model switch will also be around the corner.

Unless of course we’re thinking Copilot will be more expensive than others longer term. But is that a reasonable assumption?

➕ show 1 reply

verdverm • yesterday at 8:24 PM

I've been having really good results with DeepSeek-v4-flash, qwen-3.6-moe, and the older gimini-3-flash-preview. (recent geminis suck hard)

Small models are more than enough for the majority of tasks these days. Plan and review with the bigger ones, let the little ones explore and implement.

OpenCode Go is $10/month for the open weight models with nice quotas: https://opencode.ai/go

➕ show 1 reply

partiallypro • yesterday at 9:10 PM

> "GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs"

AI is expensive and it has been heavily subsidized. I you think $20/mo for Codex/Claude flat vs a more usage based model you're in for a shock. Especially once these companies go public and have to meet investor expectations.

emsign • yesterday at 8:47 PM

I wonder when THEY make it illegal to vote with your wallet.

alt Hacker News

Replies