This reminds me of when I tried to let Claude port an Android libgdx-based game to a WASM-based libgdx version, so I can play the game in the browser.
No matter how much I tried to force it to stick to a mostly line-by-line port, it kept trying to "improve" the code. At some point it had to undo everything as it introduced a number of bugs. I asked it: "What should I add to your prompt so you won't do this again?" and it gave me this:
### CRITICAL LESSON: Don't "Improve" During Porting
- **BIGGEST MISTAKE: Reorganizing working code**
- **What I did wrong:** Tried to "simplify" by splitting `createStartButton()` into separate creation and layout methods
- **Why it failed:** Introduced THREE bugs:
1. Layout overlap (getY() vs getY() - getHeight())
2. Children not sized (Group.setSize() doesn't affect children)
3. Origins not updated (scaling animations broken)
- **The fix:** Deleted my "improvements" and copied the original Android pattern faithfully
- **Root cause:** Arrogance - assuming I could improve production-tested code without understanding all the constraints
- **Solution:** **FOLLOW THE PORTING PRINCIPLES ABOVE** - copy first, don't reorganize
- **Time wasted:** ~1 hour debugging self-inflicted bugs that wouldn't exist if I'd just copied the original
- **Key insight:** The original Android code is correct and battle-tested. Your "improvements" are bugs waiting to happen.
I like the self-reflection of Claude, unfortunately even adding this to CLAUDE.md didn't fix it and it kept taking wrong turns so I had to abandon the effort.Was this Claude Code? If you tried it with one file at a time in the chat UI I think you would get a straight-line port, no?
Edit: It could be because Rust works a little differently from other languages, a 1:1 port is not always possible or idiomatic. I haven't done much with Rust but whenever I try porting something to Rust with LLMs, it imports like 20 cargo crates first (even when there were no dependencies in the original language).
Also Rust for gamedev was a painful experience for me, because rust hates globals (and has nanny totalitarianism so there's no way to tell it "actually I am an adult, let me do the thing"), so you have to do weird workarounds for it. GPT started telling me some insane things like, oh it's simple you just need this rube goldberg of macro crates. I thought it was tripping balls until I joined a Rust discord and got the same advice. I just switched back to TS and redid the whole thing on the last day of the jam.
For anything large like this, I think it's critical that you port over the tests first, and then essentially force it to get the tests passing without mutating the tests. This works nicely for stuff that's very purely functional, a lot harder with a GUI app though.
Worth pointing out that your IDE/plugin usually adds a whole bunch of prompts before yours - let alone the prompts that the model hosting provider prepends as well.
This might be what is encouraging the agent to do best practices like improvements. Looking at mine:
>You are a highly sophisticated automated coding agent with expert-level knowledge across many different programming languages and frameworks and software engineering tasks - this encompasses debugging issues, implementing new features, restructuring code, and providing code explanations, among other engineering activities.
I could imagine that an LLM could well interpret that to mean improve things as it goes. Models (like humans) don't respond well to things in the negative (don't think about pink monkeys - Now we're both thinking about them).
Sonnet 4.5 had this problem. Opus 4.5 is much better at focusing on the task instead of getting sidetracked.
One thing that might be effective at limited-interaction recovery-from-ignoring-CLAUDE.md is the code-review plugin [1], which spawns agents who check that the changes conform to rules specified in CLAUDE.md.
[1] https://github.com/anthropics/claude-code/blob/main/plugins/...
Tangential but doesn't libgdx have native web support?
I wish there was a feature to say "you must re-read X" after each compaction.
That’s a terrible prompt, more focused on flagellating itself for getting things wrong than actually documenting and instructing what’s needed in future sessions. Not surprising it doesn’t help.
Well its close to AGI, can you really expect AGI to follow simple instructions from dumbos like you when it can do the work of god?
Claude doesn't know why it acted the way it acted, it is only predicting why it acted. I see people falling for this trap all the time