In my harness i implemented apply_patch just taking unified diffs for patch -p1. I was shocked to se...

mappu • yesterday at 9:21 PM • 1 reply • view on HN

In my harness i implemented apply_patch just taking unified diffs for patch -p1. I was shocked to see how bad models are at generating them. I started logging diff failures to analyse -

- All models are terrible at generating line numbers for a proper diff, give up on them

- Some models (Owl-alpha) must have been post-trained on Codex transcripts, because they occasionally push its V4A patch format into any diff tool available

- Codex puts a lot of info in its system prompt about the desired patch style, making larger hunks instead of granular ones, etc

Replies

fractorial • yesterday at 10:00 PM

In my harness, I implemented tool_edit as a subset of Rob Pike’s Sam editor syntax [0].

Only need ~650 tokens of system prompt for it to work. It’s pretty stellar.

[0] https://9p.io/sys/doc/sam/sam.html

alt Hacker News

Replies