We've repeatedly seen that these test-driven LLM rewrites consistently produce absolute garbage.
Got any specific examples? I believe you, I'd just like some concrete examples to show my coworkers.
Got any specific examples? I believe you, I'd just like some concrete examples to show my coworkers.