Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it needed to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.
Very cool!
Yep, a constantly updated spec is the key. Wrote about this here:
https://lukebechtel.com/blog/vibe-speccing
I've also found it's helpful to have it keep an "experiment log" at the bottom of the original spec, or in another document, which it must update whenever things take "a surprising turn"
Do you plan on writing about the other lessons you learned, which you mentioned in the README? As a big fan of your software and writing for many years, I would deeply appreciate your perspective using these tools!
This is amazing. Is there any way you could share the log of prompts you used and other things aside from the implementation notes to reach such a result? Would love to learn from your experience and steps. Thank you
There're multiple task solutions for Claude or other llms that let it define tasks, add implementation notes and (crucially) add sub-tasks and dependencies. I'm using Beads (https://github.com/steveyegge/beads) and I think it really improves the outcome; especially for larger projects.
Was the LLM using vision capabilities to verify the correctness of it's work? If so, how was that verification method guided by you?
It's funny watching people rediscover well-established paradigms. Suddenly everyone's recreating software design documents [0].
People can say what they want about LLMs reducing intelligence/ability; The trend has clearly been that people are beginning to get more organized, document things better, enforce constraints, and think in higher-level patterns. And there's renewed interest in formal verification.
LLMs will force the skilled, employable engineer to chase both maintainability and productivity from the start, in order to maintain a competitive edge with these tools. At least until robots replace us completely.
[0] https://www.atlassian.com/work-management/knowledge-sharing/...
> No Python runtime, no PyTorch, no CUDA toolkit required at inference time.
This is amazing, Salvatore! Please spend some more time here and free us from the CUDA toolkit and Python.
So Codex would do that task with regular spec and no recompacting?
This development workcycle pattern lends nicely to Antigravity, which kind of does 80% this out the box, and can be nudged to do the rest with a little bit of prompting.
Salvatore - this is cool. I am a fan of using Steve Yegge's beads for this - it generally cuts the markdown file cruft significantly.
Did you run any benchmarking? I'm curious if python's stack is faster or slower than a pure C vibe coded inference tool.