I've been experimenting with similar concept myself. The linter loop is the only thing that can keep the agent sane in my opinion, and if anyone can generalize bun+tsc loop to other tasks, this would finally be a way to trust LLMs output.
I was annoyed at how Claude Code ignores my CLAUDE.md and skills, so I was looking for ways to expand type checking to them. So I wrote a wrapper on top of claude-agents-sdk that reads my CLAUDE.md and skills, and compiles them into rules - could be linter rules or custom checking scripts. Then it hooks up to all tools and runs the checks. The self improving part comes if some rule doesn't work: I run the tool with the session id in review mode, it proposes the fixes and improves the rule checkers. (not the md files) So it's kinda like vibe coding rules, definitely lowers the bar for me to maintain them. Repo: https://github.com/chebykinn/agent-ruler
You could try wes mckinneys roborev