logoalt Hacker News

andailast Sunday at 12:22 AM8 repliesview on HN

My experience lines up with the article. The agentic stuff only works with the biggest models. (Well, "works"... OpenAI Codex took 200 requests with o4-mini to change like 3 lines of code...)

For simple changes I actually found smaller models better because they're so much faster. So I shifted my focus from "best model" to "stupidest I can get away with".

I've been pushing that idea even further. If you give up on agentic, you can go surgical. At that point even 100x smaller models can handle it. Just tell it what to do and let it give you the diff.

Also I found the "fumble around my filesystem" approach stupid for my scale, where I can mostly fit the whole codebase into the context. So I just dump src/ into the prompt. (Other people's projects are a lot more boilerplatey so I'm testing ultra cheap models like gpt-oss-20b for code search. For that, I think you can go even cheaper...)

Patent pending.


Replies

statenjasonlast Sunday at 1:38 AM

Aider as a non-agentic coding tool strikes a nice balance on the efficiency vs effectiveness front. Using tree-sitter to create a repo map of the repository means less filesystem digging. No MCP, but shell commands mean it can use utilities I myself am familiar with. Combined with Cerebras as a provider, the turnaround on prompts is instant; I can stay involved rather than waiting on multiple rounds of tool calls. It's my go-to for smaller scale projects.

show 2 replies
hpincketlast Sunday at 5:55 AM

I am developing the same opinion. I want something fast and dependable. Getting into a flow state is important to me, and I just can't do that when I'm waiting for an agentic coding assistant to terminate.

I'm also interested in smaller models for their speed. That, or a provider like Cerebras.

Then, if you narrow the problem domain you can increase the dependability. I am curious to hear more about your "surgical" tools.

I rambled about this on my blog about a week ago: https://hpincket.com/what-would-the-vim-of-llm-tooling-look-...

show 1 reply
chewzlast Sunday at 8:21 AM

I agree. I find even Haiku good enough at managing the flow of the conversation and consulting larger models - Gemini 2.5 Pro or GPT-5 - for programming tasks.

Last few days I am experimenting with using Codex (via MCP ${codex mcp}) from Gemini CLI and it works like a charm. Gemini CLI is mostly using Flash underneath but this is good enough for formulating problems and re-evaluating answers.

Same with Claude Code - I am asking (via MCP) for consulting with Gemini 2.5 Pro.

Never had much success of using Claude Code as MCP though.

The original idea comes of course from Aider - using main, weak and editor models all at once.

wahnfriedenlast Sunday at 5:54 AM

They don't allow model switching below GPT-5 in codex cli anymore (without API key), because it's not recommended. Try it with thinking=high and it's quite an improvement from o4-mini. o4-mini is more like gpt-5-thinking-mini but they don't allow that for codex. gpt-5-thinking-high is more like o1 or maybe o3-pro.

tunesmithlast Monday at 3:32 AM

For those who don't know, OpenAI Codex CLI will now work with your ChatGPT plus or pro account. They barely announced it but it's on their github page. You don't have to use an api key.

mathiaspointlast Sunday at 12:47 PM

I use a 500 million parameter model for editor completions because I want those to nearly instantaneous and the plugin makes 50+ completion requests every session.

show 3 replies
seunosewalast Sunday at 11:14 AM

You should try GLM 4.5; it's better in practice than Kimi K2 and Qwen3 Coder, but it's not getting much hype.

SV_BubbleTimelast Sunday at 3:51 AM

> (Well, "works"... OpenAI Codex took 200 requests with o4-mini to change like 3 lines of code...)

Let’s keep something in reason, I have multiple times in my life spent days on what would end up to be maybe three lines of code.