I tried to make Claude Code, Sonnet 4.6, write a program that draws a fleur-de-lis.
No exaggeration it floundered for an hour before it started to look right.
It's really not good at tasks it has not seen before.
Considering that a fleur-de-lis involves somewhat intricate curves, I think I'd be pretty happy with myself if I could get that task done in an hour.
Given a harness that allows the model to validate the result of its program visually, and given the models are capable of using this harness to self correct (which isn't yet consistently true), then you're in a situation where in that hour you are free to do some other work.
A dishwasher might take 3 hours to do for what a human could do in 30 minutes, but they're still very useful because the machine's labor is cheaper than human labor.
LLMs are really bad at anything visual, as demonstrated by pelicans riding bicycles, or Claude Plays Pokémon.
Opus would probably do better though.
I got Opus 4.6 to one shot it, took 5-ish mins. "Write me a python program that outputs an svg of a fleur-de-lis. Use freely available images to double check your work."
It basically just re-created the wikipedia article fleur-de-lis, which I'm not sure proves anything beyond "you have to know how to use LLMs"
I tried to use Codex to write a simple TCP to QUIC proxy. I intentionally kept the request fairly simple, take one TCP connection and map it to a QUIC connection. Gave a detailed spec, went through plan mode, clarified all the misunderstandings, let it write it in Python, had it research the API, had it write a detailed step by step roadmap... The result was a fucking mess.
Beyond the fact that it was "correct" in the same way the author of the article talked about, there was absolutely bizarre shit in there. As an example, multiple times it tried to import modules that didn't exist. It noticed this when tests failed, and instead of figuring out the import problem it add a fucking try/except around the import and did some goofy Python shenanigans to make it "work".
Have you tried describing to Claude what it is? The more the detail the better the result. At some point it does become easier to just do it yourself.
Even with well understood languages, if there isn't much in the public domain for the framework you're using it's not really that helpful. You know you're at the edges of its knowledge when you can see the exact forum posts you are looking at showing up verbatim in it's responses.
I think some industries with mostly proprietary code will be a bit disappointing to use AI within.