Claude is good at assembling blocks, but still falls apart at creating them

315 points • by bblcla • 01/14/2026 • 237 comments • view on HN

Comments

woeirua • 01/16/2026

It's just amazing to me how fast the goal posts are moving. Four years ago, if you had told someone that a LLM would be able to one-shot either of those first two tasks they would've said you're crazy. The tech is moving so fast. I slept on Opus 4.5 because GPT 5 was kind of an air ball, and just started using it in the past few weeks. It's so good. Way better than almost anything that's come before it. It can one-shot tasks that we never would've considered possible before.

➕ show 5 replies

simonw • 01/15/2026

I'm not entirely convinced by the anecdote here where Claude wrote "bad" React code:

> But in context, this was obviously insane. I knew that key and id came from the same upstream source. So the correct solution was to have the upstream source also pass id to the code that had key, to let it do a fast lookup.

I've seen Claude make mistakes like that too, but then the moment you say "you can modify the calling code as well" or even ask "any way we could do this better?" it suggests the optimal solution.

My guess is that Claude is trained to bias towards making minimal edits to solve problems. This is a desirable property, because six months ago a common complaint about LLMs is that you'd ask for a small change and they would rewrite dozens of additional lines of code.

I expect that adding a CLAUDE.md rule saying "always look for more efficient implementations that might involve larger changes and propose those to the user for their confirmation if appropriate" might solve the author's complaint here.

➕ show 5 replies

maxilevi • 01/15/2026

LLMs are just really good search. Ask it to create something and it's searching within the pretrained weights. Ask it to find something and it's semantically searching within your codebase. Ask it to modify something and it will do both. Once you understand its just search, you can get really good results.

➕ show 8 replies

disconcision • 01/15/2026

I've yet to be convinced by any article, including this one, that attempts to draw boxes around what coding agents are and aren't good at in a way that is robust on a 6 to 12 month horizon.

I agree that the examples listed here are relatable, and I've seen similar in my uses of various coding harnesses, including, to some degree, ones driven by opus 4.5. But my general experience with using LLMs for development over the last few years has been that:

1. Initially models could at best assemble a simple procedural or compositional sequences of commands or functions to accomplish a basic goal, perhaps meeting tests or type checking, but with no overall coherence,

2. To being able to structure small functions reasonably,

3. To being able to structure large functions reasonably,

4. To being able to structure medium-sized files reasonably,

5. To being able to structure large files, and small multi-file subsystems, somewhat reasonably.

So the idea that they are now falling down on the multi-module or multi-file or multi-microservice level is both not particularly surprising to me and also both not particularly indicative of future performance. There is a hierarchy of scales at which abstraction can be applied, and it seems plausible to me that the march of capability improvement is a continuous push upwards in the scale at which agents can reasonably abstract code.

Alternatively, there could be that there is a legitimate discontinuity here, at which anything resembling current approaches will max out, but I don't see strong evidence for it here.

➕ show 6 replies

lordnacho • 01/15/2026

By and large, I agree with the article. Claude is great and fast at doing low level dev work. Getting the syntax right in some complicated mechanism, executing an edit-execute-readlog loop, making multi file edits.

This is exactly why I love it. It's smart enough to do my donkey work.

I've revisited the idea that typing speed doesn't matter for programmers. I think it's still an odd thing to judge a candidate on, but appreciate it in another way now. Being able to type quickly and accurately reduces frustration, and people who foresee less frustration are more likely to try the thing they are thinking about.

With LLMs, I have been able to try so many things that I never tried before. I feel that I'm learning faster because I'm not tripping over silly little things.

➕ show 4 replies

mikece • 01/14/2026

In my experience Claude is like a "good junior developer" -- can do some things really well, FUBARS other things, but on the whole something to which tasks can be delegated if things are well explained. If/when it gets to the ability level of a mid-level engineer it will be revolutionary. Typically a mid-level engineer can be relied upon to do the right thing with no/minimal oversight, can figure out incomplete instructions, and deliver quality results (and even train up the juniors on some things). At that point the only reason to have human junior engineers is so they can learn their way up the ladder to being an architect and responsible coordinating swarms of Claude Agents to develop whole applications and complete complex tasks and initiatives.

Beyond that what can Claude do... analyze the business and market as a whole and decide on product features, industry inefficiencies, gap analysis, and then define projects to address those and coordinate fleets of agents to change or even radically pivot an entire business?

I don't think we'll get to the point where all you have is a CEO and a massive Claude account but it's not completely science fiction the more I think about it.

➕ show 4 replies

ChicagoDave • 01/16/2026

I have several projects that counter this article. Not sure why, but I’ve extracted clean, readable, well-constructed, and well-tested code.

I might write something up at some point, but I can share this:

https://github.com/chicagodave/devarch/

New repo with guides for how I use Claude Code.

➕ show 1 reply

michalsustr • 01/15/2026

This article resonates exactly how I think about it as well. For example, at minfx.ai (a Neptune/wandb alternative), we cache time series that can contain millions of floats for fast access. Any engineer worth their title would never make a copy of these and would pass around pointers for access. Opus, when stuck in a place where passing the pointer was a bit more difficult (due to async and Rust lifetimes), would just make the copy, rather than rearchitect or at least stop and notify user. Many such examples of ‘lazy’ and thus bad design.

alphazard • 01/15/2026

This sounds suspiciously like the average developer, which is what the transformer models have been trained to emulate.

Designing good APIs is hard, being good at it is rare. That's why most APIs suck, and all of us have a negative prior about calling out to an API or adding a dependency on a new one. It takes a strong theory of mind, a resistance to the curse of knowledge, and experience working on both sides of the boundary, to make a good API. It's no surprise that Claude isn't good at it, most humans aren't either.

joshcsimmons • 01/15/2026

IDK I've been using opus 4.5 to create a UI library and it's been doing pretty well: https://simsies.xyz/ (still early days)

Granted it was building ontop of tailwind (shifting over to radix after the layoff news). Begs the question? What is a lego?

➕ show 3 replies

Scrapemist • 01/15/2026

Eventually you can show Claude how you solve problems, and explain the thought process behind it. It can apply these learnings but it will encounter new challenges in doing so. It would be nice if Claude could instigate a conversation to go over the issues in depth. Now it wants quick confirmation to plough ahead.

➕ show 1 reply

0xbadcafebee • 01/16/2026

I don't think it's possible to make an AI a "Senior Engineer", or even a good engineer, by training it on random crap from the internet. It's got a million brains' worth of code in it. That means bad patterns as well as good. You'd need to remove the bad patterns for it not to "remember" and regurgitate them. I don't think prompts help with this either, it's like putting a band-aid on head trauma.

➕ show 1 reply

Havoc • 01/16/2026

> Claude can’t create good abstractions on its own

LLMs definitely can create abstractions and boundaries. e.g. most will lean towards a pretty clean front end vs backend split even without hints. Or work out a data structure that fits the need. Or splits things into free standing modules. Or structure a plan into phases.

So this really just boils down to „good” abstractions which is subject to model improvement.

I really don’t see a durable moat for us meatbags in this line of reasoning

➕ show 1 reply

iamacyborg • 01/15/2026

Here’s an example of a plan I’m working on in CC, it’s very thorough, albeit required a lot of handholding and fact checking on a number of points as it’s first few passes didn’t properly anonymise data.

https://docs.google.com/document/u/0/d/1zo_VkQGQSuBHCP45DfO7...

machiaweliczny • 01/16/2026

Yeah, that's my current gripe but I think this just needs some good examples in AGENTS.md (I've done some for hooks and it kinda works but need to remind it). I need good AGENTS.md that explain what good abstraction boundary is and how to define is the problem is I am not sure I know how to put it into words, if anyone has idea please let me know.

EGreg • 01/16/2026

This is exactly what we found out a year ago for all AI builders. But what is the best way to convince early investors of this thesis? They seem to be all-in on just building everything from scratch end-to-end. Here is what we built:

https://engageusers.ai/ecosystem.pdf

malka1986 • 01/16/2026

I am making an app in Elixir.

100% of code is made by Claude.

It is damn good at making "blocks".

However, Elixir seems to be a langage that works very well for LLM, cf. https://elixirforum.com/t/llm-coding-benchmark-by-language/7...

➕ show 1 reply

joduplessis • 01/16/2026

Recently I've put Claude/others to use in some agentic workflows with easy menial/repetitive tasks. I just don't understand how people are using these agents in production. The automation is absolutely great, but it requires an insane amount of hand-holding and cleanup.

➕ show 2 replies

iamleppert • 01/16/2026

I use Claude daily and I 100% disagree with the author. The article reeks of someone who doesn't understand how to manage context appropriately or describe their requirements, or know how to build up a task iteratively with a coding agent. If you have certain requirements or want things done in a certain way, you need to be explicit and the order of operations you do things in matters a lot in how efficient it completes the task, and the quality of the final output. It's very good at doing the least amount of work to just make something work by default, but that's not always what you want. Sometimes it is. I'd much rather prefer that as the default mode of operation than something that makes a project out of every little change.

The developers who aren't figuring out how to leverage AI tools and make them work for them are going to get left behind very quickly. Unless you're in the top tier of engineers, I'm not sure how one can blame the tools at this point.

anshumankmr • 01/16/2026

IDK its been pretty solid (but it does mess up) which is where I come in. But it has helped me work with Databricks (read/writing from it) and train a model using it for some of our customers, though its NOT in prod.

doug_durham • 01/15/2026

Did the author ask it to make new abstractions? In my experience when I produces output that I don't like I ask it to refactor it. These models have and understanding of all modern design patterns. Just ask it to adopt one.

➕ show 1 reply

esafak • 01/15/2026

> Claude doesn’t have a soul. It doesn't want anything.

Ha! I don't know what that has to do with anything, but this is exactly what I thought while watching Pluribus.

jondwillis • 01/16/2026

Regardless, yet another path to the middle class is closing for a lot of people. RIP (probably me too)

geldedus • 01/17/2026

The level of anti-AI cope is so entertaining!

lxe • 01/16/2026

Eh. This is yet another "I tried AI to do a thing, and it didn't do it the way I wanted it, therefore I'm convinced that's just how it is... here's a blog about it" article.

"Claude tries to write React, and fails"... how many times? what's the rate of failure? What have you tried to guide it to perform better.

These articles are similar to HN 15 years ago when people wrote "Node.JS is slow and bad"

MarginalGainz • 01/16/2026

This mirrors my experience trying to integrate LLMs into production pipelines.

The issue seems to be that LLMs treat code as a literary exercise rather than a graph problem. Claude is fantastic at the syntax and local logic ('assembling blocks'), but it lacks the persistent global state required to understand how a change in module A implicitly breaks a constraint in module Z.

Until we stop treating coding agents as 'text predictors' and start grounding them in an actual AST (Abstract Syntax Tree) or dependency graph, they will remain helpful juniors rather than architects.

mklyachman • 01/15/2026

[flagged]

➕ show 1 reply

alt Hacker News

Claude is good at assembling blocks, but still falls apart at creating them

Comments