I think the advancements around models and such are still somewhat interesting but its all the hype ...

jimmyjazz14 • yesterday at 8:47 PM • 1 reply • view on HN

I think the advancements around models and such are still somewhat interesting but its all the hype around peripheral things like OpenClaw, agentic workflows and other hyped up AI-adjacent news that are getting pretty old.

Replies

Aerroon • yesterday at 8:57 PM

I think the workflows can be really interesting to read about. The other week I read a reddit post how someone got Qwen3.5 35B-A3B to go from 22.2% on the 45 hard problems of swebench-verified to 37.8% (opus 4.6 gets 40%).

All they essentially did was tell the LLM to test and verify whether the answer is correct with a prompt like the following:

>"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected."

Now whether this is true, I don't know, but I think talking about this kind of stuff is cool!

alt Hacker News

Replies