logoalt Hacker News

behaviorstoday at 9:19 AM0 repliesview on HN

It's funny, because that task is very diverse. Any LLM will use the codebase given as a template(At least in free-tier models)

My software as a contract of behaviors works like a program bench(I even cross tested buildouts) Made an entire corpus layout for multi agent multi platform builds to be compared. Even went ahead and ran 50 contracts for an example. It honestly showed improvable areas, and distinct differences between model code.

{contract_name}/ └── submissions/ └── {date}_{os}_{agent}_{model}_{stack}/ ├── {contract}.osc.md ├── osc.osc.md └── results/ └── {contract}.snapshot.json That's it, compare to the same contract, or find a new contract to use to compare. Lot's of signed/hash pinned files are all you need to reproduce software from nothing, with an LLM.

Programbench is close to that(they have a nice paper/article here. But I don't like the work used. Having software to start with is not a bench of making code but reverse engineering.

github/s1ugh34d/osc