Claude is really good at specific analysis, but really terrible at open-ended problems. "Hey ...

xnorswap • yesterday at 4:02 PM • 21 replies • view on HN

Claude is really good at specific analysis, but really terrible at open-ended problems.

"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.

"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.

It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.

There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.

That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.

But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.

Replies

pdntspa • yesterday at 4:23 PM

That's why you treat it like a junior dev. You do the fun stuff of supervising the product, overseeing design and implementation, breaking up the work, and reviewing the outputs. It does the boring stuff of actually writing the code.

I am phenomenally productive this way, I am happier at my job, and its quality of work is extremely high as long as I occasionally have it stop and self-review it's progress against the style principles articulated in its AGENTS.md file. (As it tends to forget a lot of rules like DRY)

➕ show 11 replies

order-matters • yesterday at 6:50 PM

TBH I think its ability to structure unstructured data is what makes it a powerhouse tool and there is so much juice to squeeze there that we can make process improvements for years even if it doesnt get any better at general intelligence.

If I had a pdf printout of a table, the workflow i used to have to use to get that back into a table data structure to use for automation was hard (annoying). dedicated OCR tools with limitations on inputs, multiple models in that tool for the different ways the paper the table was on might be formatted. it took hours for a new input format

now i can take a photo of something with my phone and get a data table in like 30 seconds.

people seem so desperate to outsource their thinking to these models and operating at the limits of their capability, but i have been having a blast using it to cut through so much tedium that werent unsolved problems but required enough specialized tooling and custom config to be left alone unless you really had to

this fits into what youre saying with using it to do the grunt work i find boring i suppose, but feels a little bit more than that - like it has opened a lot of doors to spaces that had grunt work that wasnt worth doing for the end result previously but now it is

➕ show 1 reply

mbesto • yesterday at 4:44 PM

> There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.

While this is true in my experience, the opposite is not true. LLMs are very good at helping me go through a structure processing of thinking about architectural and structural design and then help build a corresponding specification.

More specifically the "idea honing" part of this proposed process works REALLY well: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/

This: Each question should build on my previous answers, and our end goal is to have a detailed specification I can hand off to a developer. Let’s do this iteratively and dig into every relevant detail. Remember, only one question at a time.

➕ show 1 reply

asmor • yesterday at 4:32 PM

This is it. It doesn't replace the higher level knowledge part very well.

I asked Claude to fix a pet peeve of mine, spawning a second process inside an existing Wine session (pretty hard if you use umu, since it runs in a user namespace). I asked Claude to write me a python server to spawn another process to pass through a file handler "in Proton", and it proceeded a long loop of trying to find a way to launch into an existing wine session from Linux with tons of environment variables that didn't exist.

Then I specified "server to run in Wine using Windows Python" and it got more things right. Except it tried to use named pipes for IPC. Which, surprise surprise, doesn't work to talk to the Linux piece. Only after I specified "local TCP socket" it started to go right. Had I written all those technical constraints and made the design decisions in the first message it'd have been a one-hit success.

james_marks • yesterday at 4:12 PM

This is a key part of the AI love/hate flame war.

Very easy to write it off when it spins out on the open-ended problems, without seeing just how effective it can be once you zoom in.

Of course, zooming in that far gives back some of the promised gains.

Edit: typo

➕ show 2 replies

dolftax • yesterday at 7:55 PM

The structured vs open-ended distinction here applies to code review too. When you ask an LLM to "find issues in this code", it'll happily find something to say, even if the code is fine. And when there are actual security vulnerabilities, it often gets distracted by style nitpicks and misses the real issues.

Static analysis has the opposite problem - very structured, deterministic, but limited to predefined patterns and overwhelms you in false positives.

The sweet spot seems to be to give structure to what the LLM should look for, rather than letting it roam free on an open-ended "review this" prompt.

We built Autofix Bot[1] around this idea.

[1] https://autofix.bot (disclosure: founder)

ericmcer • yesterday at 7:49 PM

Exactly, if you visualize software as a bunch separate "states" (UI state, app state, DB state) then our job is to mutate states and synchronize those mutations across the system. LLMs are good at mutating a specific state in a specific way. They are trash at designing what data shape a state should be, and they are bad at figuring out how/why to propagate mutations across a system.

mkw5053 • today at 2:24 AM

I’ve had reasonable success having it ultrathink of every possible X (exhaustively) and their trades offs and then give me a ranked list and rationale of its top recommendations. I almost always choose the top but just reading the list and then giving it next steps has worked really well for me.

plufz • yesterday at 4:09 PM

I think slash commands are great to help Claude with this. I have many like /code:dry /code:clean-code etc that has a semi long prompt and references to longer docs to review code from a specific perspective. I think it atleast improves Claude a bit in this area. Like processes or templates for thinking in broader ways. But yes I agree it struggles a lot in this area.

➕ show 1 reply

d-lisp • yesterday at 6:39 PM

I remember about a problem I had while quick testing notcurses. I tried chatGPT which produced a lot of weird but kinda believable statements about the fact that I had to include wchar and define a specific preprocessor macro, AND I had to place the includes for notcurses, other includes and macros in a specific order.

My sentiment was "that's obviously a weird non-intended hack" but I wanted to test quickly, and well ... it worked. Later, reading the man-pages I aknowledged the fact that I needed to declare specific flags for gcc in place of the gpt advised solution.

I think these kind of value based judgements are hard to emulate for LLMs, it's hard for them to identifiate a single source as the most authoritative source in a sea of lesser authoritative (but numerous) sources.

cyral • yesterday at 4:25 PM

Using the plan mode in cursor (or asking claude to first come up with a plan) makes it pretty good at generic "how can I improve" prompts. It can spend more effort exploring the codebase and thinking before implementing.

giancarlostoro • yesterday at 4:33 PM

> "Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.

This is true, as for "Open Ended" I use Beads with Claude code, I ask it to identify things based on criteria (even if its open ended) then I ask it to make tasks, then when its done I ask it to research and ask clarifying questions for those tasks. This works really well.

cultofmetatron • yesterday at 5:28 PM

> There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving.

thats called job security!

theshrike79 • yesterday at 8:39 PM

Codex is better for the latter style. It takes its time, mulls about and investigates and sometimes finds a nugget of gold.

Claude is for getting shit done, it's not at its best at long research tasks.

kccqzy • yesterday at 4:09 PM

Not at all my experience. I’ve often tried things like telling Claude this SIMD code I wrote performed poorly and I needed some ideas to make it go faster. Claude usually does a good job rewriting the SIMD to use different and faster operations.

➕ show 2 replies

andai • yesterday at 6:16 PM

The current paradigm is we sorta-kinda got AGI by putting dodgy AI in a loop:

until works { try again }

The stuff is getting so cheap and so fast... a sufficient increment in quantity can produce a phase change in quality.

ludicrousdispla • yesterday at 5:57 PM

>> "Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.

Back in the day, we would just do this with a search engine.

➕ show 1 reply

fudged71 • yesterday at 4:23 PM

This tells me that we need to build 1000 more linters of all kinds

➕ show 1 reply

awesome_dude • yesterday at 11:27 PM

My experience has been with Claude that having it "review" my code has produced some helpful feedback and refactoring suggestions, but also, it falls short in others

ljm • yesterday at 10:57 PM

I am basically rawdogging Claude these days, I don’t use MCPs or anything else, I just lay down all of the requirements and the suggestions and the hints, and let it go to work.

When I see my colleagues use an LLM they are treating it like a mind reader and their prompts are, frankly, dogshit.

It shows that articulating a problem is an important skill.

kitsune1 • yesterday at 5:05 PM

[dead]

alt Hacker News

Replies