logoalt Hacker News

jahooma11/08/20241 replyview on HN

Sure! I git cloned some open source projects, and wrote a script (with Codebuff) to pick commits and individual diffs of files. For each of those, I had Claude write a sketch of what changed from the old file to the new.

This is all the data I need: the old file, the sketch of how Claude would update it, and the ground truth diff that should be produced. I compiled this into the ideal conversation where the assistant responds with the perfect patch, and that became the training set. I think I had on the order of ~300 of these conversations for the first run, and it worked pretty well.

I came up with more improvements too, like replacing all the variant placeholder comments like "// ... existing code ..." or "# ... (keep the rest of the function)" with one [[*REPLACE_WITH_EXISITNG_CODE*]] symbol, and that made it more accurate


Replies

abossy11/08/2024

Very interesting, thanks!