I've experimented with agentic coding/engineering a lot recently. My observation is that s...

mohsen1 • today at 7:24 AM • 5 replies • view on HN

I've experimented with agentic coding/engineering a lot recently. My observation is that software that is easily tested are perfect for this sort of agentic loop.

In one of my experiments I had the simple goal of "making Linux binaries smaller to download using better compression" [1]. Compression is perfect for this. Easily validated (binary -> compress -> decompress -> binary) so each iteration should make a dent otherwise the attempt is thrown out.

Lessons I learned from my attempts:

- Do not micro-manage. AI is probably good at coming up with ideas and does not need your input too much

- Test harness is everything, if you don't have a way of validating the work, the loop will go stray

- Let the iterations experiment. Let AI explore ideas and break things in its experiment. The iteration might take longer but those experiments are valuable for the next iteration

- Keep some .md files as scratch pad in between sessions so each iteration in the loop can learn from previous experiments and attempts

[1] https://github.com/mohsen1/fesh

Replies

medi8r • today at 8:07 AM

You have to have really good tests as it fucks up in strange ways people don't (because I think experienced programmers run loops in their brain as they code)

Good news - agents are good at open ended adding new tests and finding bugs. Do that. Also do unit tests and playwright. Testing everything via web driving seems insane pre agents but now its more than doable.

skapadia • today at 12:19 PM

"Test harness is everything, if you don't have a way of validating the work, the loop will go stray"

This is the most important piece to using AI coding agents. They are truly magical machines that can make easy work of a large number of development, general purpose computing, and data collection tasks, but without deterministic and executable checks and tests, you can't guarantee anything from one iteration of the loop to the next.

MartyMcBot • today at 1:38 PM

the .md scratch pad point is underrated, and the format matters more than people realize.

summaries ("tried X, tried Y, settled on Z") are better than nothing, but the next iteration can mostly reconstruct them from test results anyway. what's actually irreplaceable is the constraint log: "approach B rejected because latency spikes above N ms on target hardware" means the agent doesn't re-propose B the next session. without it, every iteration rediscovers the same dead ends.

ended up splitting it into decisions.md and rejections.md. counter-intuitively, rejections.md turned out to be the more useful file. the decisions are visible in the code. the rejections are invisible — and invisible constraints are exactly what agents repeatedly violate.

octoclaw • today at 10:03 AM

[dead]

CloakHQ • today at 9:18 AM

The test harness point is the one that really sticks for me too. We've been using agentic loops for browser automation work, and the domain has a natural validation signal: either the browser session behaves the way a real user would, or it doesn't. That binary feedback closes the loop really cleanly.

The tricky part in our case is that "behaves correctly" has two layers - functional (did it navigate correctly?) and behavioral (does it look human to detection systems?). Agents are fine with the first layer but have no intuition for the second. Injecting behavioral validation into the loop was the thing that actually made it useful.

The .md scratch pad between sessions is underrated. We ended up formalizing it into a short decisions log - not a summary of what happened, just the non-obvious choices and why. The difference between "we tried X" and "we tried X, it failed because Y, so we use Z instead" is huge for the next session.

➕ show 1 reply

alt Hacker News

Replies