I think the advancements around models and such are still somewhat interesting but its all the hype around peripheral things like OpenClaw, agentic workflows and other hyped up AI-adjacent news that are getting pretty old.
I think the workflows can be really interesting to read about. The other week I read a reddit post how someone got Qwen3.5 35B-A3B to go from 22.2% on the 45 hard problems of swebench-verified to 37.8% (opus 4.6 gets 40%).
All they essentially did was tell the LLM to test and verify whether the answer is correct with a prompt like the following:
>"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected."
Now whether this is true, I don't know, but I think talking about this kind of stuff is cool!
I think the workflows can be really interesting to read about. The other week I read a reddit post how someone got Qwen3.5 35B-A3B to go from 22.2% on the 45 hard problems of swebench-verified to 37.8% (opus 4.6 gets 40%).
All they essentially did was tell the LLM to test and verify whether the answer is correct with a prompt like the following:
>"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected."
Now whether this is true, I don't know, but I think talking about this kind of stuff is cool!