> if it has a lot of relevant data that it was trained on
This became evident to me the moment I tried to have these models work on some PowerShell tasks for me. Even Opus today struggles with PowerShell.
Since anything in PS is probably some internal sysadmin tool, there's not much public code out there outside of Microsoft's documentation. Plus the Verb-Noun naming scheme makes it really easy to just hallucinate cmdlets (which it does, often). Its easier to have the LLM just do things in python using M365 Graph API than any of the provided PowerShell cmdlets.
OTOH, I've been using Claude for a lot of Swift & Swift UI work lately and it has no problems there, and I'd imagine there's even less publicly available training data for that so to be honest I'm not entirely sure why it fails so badly at powershell.
I have deepseek or grok write bash-likes in pwsh often enough to wonder what sort of things you're doing in pwsh...
I use it to wrap ping.exe with colors and fewer columns, for example. yt-dlp wrapper to fetch 480p bestaudio with English subtitles, no playlist, works on a surprising number of video sites.
It does make cmdlets up, you're right, there.