I feel like I'm taking crazy pills. The article starts with: > you give it a simple task. ...

GolDDranks • yesterday at 4:56 PM • 16 replies • view on HN

I feel like I'm taking crazy pills. The article starts with:

> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.

That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.

Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.

Replies

b33j0r • yesterday at 5:12 PM

I usually do most of the engineering and it works great for writing the code. I’ll say:

> There should be a TaskManager that stores Task objects in a sorted set, with the deadline as the sort key. There should be methods to add a task and pop the current top task. The TaskManager owns the memory when the Task is in the sorted set, and the caller to pop should own it after it is popped. To enforce this, the caller to pop must pass in an allocator and will receive a copy of the Task. The Task will be freed from the sorted set after the pop.

> The payload of the Task should be an object carrying a pointer to a context and a pointer to a function that takes this context as an argument.

> Update the tests and make sure they pass before completing. The test scenarios should relate to the use-case domain of this project, which is home automation (see the readme and nearby tests).

➕ show 4 replies

threethirtytwo • yesterday at 6:22 PM

I think it’s usage patterns. It is you in a sense.

You can’t deny the fact that someone like Ryan dhal creator of nodejs declared that he no longer writes code is objectively contrary to your own experience. Something is different.

I think you and other deniers try one prompt and then they see the issues and stop.

Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want. The first output is almost always not what you want. It is the feedback loop between you and the AI that cohesively creates something better than each individual aspect of the human-AI partnership.

➕ show 2 replies

Balinares • yesterday at 9:47 PM

Nah, I'm with you there. I've yet to see even Opus 4.5 produce something close to production-ready -- in fact Opus seems like quite a major defect factory, given its consistent tendency toward hardcoding case by case workarounds for issues caused by its own bad design choices.

I think uncritical AI enthusiasts are just essentially making the bet that the rising mountains of tech debt they are leaving in their wake can be paid off later on with yet more AI. And you know, that might even work out. Until such a time, though, and as things currently stand, I struggle to understand how one can view raw LLM code and find it acceptable by any professional standard.

giancarlostoro • yesterday at 8:44 PM

The secret sauce for me is Beads. Once Beads is setup you make the tasks and refine them and by the end each task is a very detailed prompt. I have Claude ask me clarifying questions, do research for best practices etc

Because of Beads I can have Claude do a code review for serious bugs and issues and sure enough it finds some interesting things I overlooked.

I have also seen my peers in the reverse engineering field make breakthroughs emulating runtimes that have no or limited existing runtimes, all from the ground up mind you.

I think the key is thinking of yourself as an architect / mentor for a capable and promising Junior developer.

jasondigitized • yesterday at 5:30 PM

I feel like I am taking crazy pills. I am getting code that works from Opus 4.5. It seems like people are living in two separate worlds.

➕ show 6 replies

jjice • yesterday at 6:00 PM

I've found that the thing that made is really click for me was having reusable rules (each agent accepts these differently) that help tell it patterns and structure you want.

I have ones that describe what kinds of functions get unit vs integration tests, how to structure them, and the general kinds of test cases to check for (they love writing way too many tests IME). It has reduced the back and forth I have with the LLM telling it to correct something.

Usually the first time it does something I don't like, I have it correct it. Once it's in a satisfactory state, I tell it to write a Cursor rule describing the situation BRIEFLY (it gets way to verbose by default) and how to structure things.

That has made writing LLM code so much more enjoyable for me.

ActorNightly • yesterday at 6:16 PM

Its really becoming a good litmus test for how someones coding ability whether they think LLMS can do well on complex tasks.

For example, someone may ask an LLM to write a simple http web server, and it can do that fine, and they consider that complex, when in reality its really not.

➕ show 1 reply

Obscurity4340 • today at 12:17 AM

It helps to write out the prompt in a seperate text editor so you can edit it and try to desribe what the input is, and what output you want as well as try to describe and catch likely or iteratively observed issues.

You try a gamut of sample inputs and observe where its going awry? Describe the error to it and see what it does

nozzlegear • yesterday at 6:07 PM

You're not taking crazy pills, this is my exact experience too. I've been using my wife's eCommerce shop (a headless Medusa instance, which has pretty good docs and even their own documentation LLM) as a 100% vibe-coded project using Claude Code, and it has been one comedy of errors after another. I can't tell you how many times I've had it go through the loop of Cart + Payment Collection link is broken -> Redeploy -> Webhook is broken (can't find payment collection) -> Redeploy -> Cart + Payment Collection link is broken -> Repeat. And it never seems to remember the reasons it had done something previously – despite it being plastered 8000 times across the CLAUDE.md file – so it bumbles into the same fuckups over and over again.

A complete exercise in frustration that has turned me off of all agentic code bullshit. The only reason I still have Claude Code installed is because I like the `/multi-commit` skill I made.

➕ show 1 reply

dev_l1x_be • yesterday at 4:59 PM

Well one way of solving this is to keep giving it simple tasks.

➕ show 2 replies

SCdF • yesterday at 5:20 PM

I am getting workable code with Claude on a 10kloc Typescript project. I ask it to make plans then execute them step by step. I have yet to try something larger, or something more obscure.

➕ show 2 replies

__grob • yesterday at 6:43 PM

It still amazes me that so many people can see LLMs writing code as anything less than a miracle in computing...

➕ show 1 reply

echohack5 • yesterday at 5:12 PM

I have found AI great in alot of scenarios but If I have a specific workflow, then the answer is specific and the ai will get it wrong 100% of the time. You have a great point here.

A trivial example is your happy path git workflow. I want:

- pull main

- make new branch in user/feature format

- Commit, always sign with my ssh key

- push

- open pr

but it always will

- not sign commits

- not pull main

- not know to rebase if changes are in flight

- make a million unnecessary commits

- not squash when making a million unnecessary commits

- have no guardrails when pushing to main (oops!)

- add too many comments

- commit message too long

- spam the pr comment with hallucinated test plans

- incorrectly attribute itself as coauthor in some gorilla marketing effort (fixable with config, but whyyyyyy -- also this isn't just annoying, it breaks compliance in alot of places and fundamentally misunderstands the whole point of authorship, which is copyright --- and AIs can't own copyright )

- not make DCO compliant commits ...

Commit spam is particularly bad for bisect bug hunting and ref performance issues at scale. Sure I can enforce Squash and Merge on my repo but why am I relying on that if the AI is so smart?

All of these things are fixed with aliases / magit / cli usage, using the thing the way we have always done it.

➕ show 2 replies

GolDDranks • yesterday at 5:01 PM

Just a supplementary fact: I'm in the beneficial position, against the AI, that in a case where it's hard to provide that automatic feedback loop, I can run and test the code at my discretion, whereas the AI model can't.

Yet. Most of my criticism is not after running the code, but after _reading_ the code. It wrote code. I read it. And I am not happy with it. No even need to run it, it's shit at glance.

➕ show 3 replies

causalscience • yesterday at 8:52 PM

You're not crazy, I'm also always disappointed.

My theory is that the people who are impressed are trying to build CRUD apps or something like that.

➕ show 1 reply

t55 • yesterday at 5:12 PM

[flagged]

➕ show 1 reply

alt Hacker News

Replies