logoalt Hacker News

skydhashtoday at 4:13 PM3 repliesview on HN

That would be more convincing if you put up two or more examples of what is there to learn.


Replies

elliotbnvltoday at 4:27 PM

Go off and run a comparison of Qwen 3.6 27B and GLM 5.1 GGUF (https://huggingface.co/ubergarm/GLM-5.1-GGUF) at IQ2_KL 261.988 GiB (2.985 BPW) and let me know if you learn anything.

Or maybe just compare Hermes vs OpenClaw for long-horizon personal agentic tasks. Which one performs better in offline inference personal finance analysis tasks?

Or read up on how the `/code-review` workflow works in Opus 4.8 and give me a guess as to how long it'll take Codex to implement it and which tool would be more appropriate for your engineering team (don't forget to include enterprise API token costs in workflows – it can spin up 100 agents in thirty seconds).

If you can figure out how to secure agents with simultaneous access to personal data and the internet to run unsupervised while avoiding the lethal trifecta (Willison, 2025) let me know.

show 1 reply
xendotoday at 4:40 PM

Even with examples it's still not convincing. I'm working on real products so I don't have time to waste comparing models that won't be relevant next month.

naaskingtoday at 4:27 PM

Using AI effectively for long horizon tasks, like maintaining a large codebase, is a wide open field. No single AI is good at it autonomously. That means achieving the right balance of testing, formal specification of pre/post-conditions and invariants and manual review.

It's like having a naive but super knowledgeable junior developer starting under you. It's obvious you'd learn a lot in how to communicate, framing, specifications, and what kind of follow-up you'd need to do to ensure good results.