logoalt Hacker News

hu3today at 6:24 PM0 repliesview on HN

> we found that GPT-5.2 cannot even compute the parity of a short string like 11000, and GPT-5.2 cannot determine whether the parentheses in ((((()))))) are balanced.

I think there is a valid insight here which many already know: LLMs are much more reliable at creating scripts and automation to do certain tasks than doing these tasks themselves.

For example if I provide an LLM my database schema and tell it to scan for redundant indexes and point out wrong naming conventions, it might do a passable but incomplete job.

But if I tell the LLM to code a python or nodejs script to do the same, I get significantly better results. And it's often faster too to generate and run the script than to let LLMs process large SQL files.