The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.
It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days, probably a week, working nonstop. On the other hand, there's plenty of paper cuts.
I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.
It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days, probably a week, working nonstop. On the other hand, there's plenty of paper cuts.
I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.