So what do I do differently then? Hypothetically, you have a simple slice out of bounds error beca...

ericmcer • yesterday at 4:44 PM • 6 replies • view on HN

So what do I do differently then?

Hypothetically, you have a simple slice out of bounds error because a function is getting an empty string so it does something like: `""[5]`.

Opus will add a bunch of length & nil checks to "fix" this, but the actual issue is the string should never be empty. The nil checks are just papering over a deeper issue, like you probably need a schema level check for minimum string length.

At that point do you just tell it like "no delete all that, the string should never be empty" and let it figure that out, or do I basically need to pseudo code "add a check for empty strings to this file on line 145", or do I just YOLO and know the issue is gone now so it is no longer my problem?

My bigger point is how does an LLM know that this seemingly small problem is indicative of some larger failure, like lets say this string is a `user.username` which means users can set their name to empty which means an entire migration is probably necessary. All the AI is going to do is smoosh the error messages and kick the can.

Replies

UI_at_80x24 • yesterday at 5:10 PM

I have encountered the exact same kind of frustration, and no amount of prompting seems to prevent it from "randomly" happening.

`the error is on line #145 fix it with XYZ and add a check that no string should ever be blank`

It's the randomness that is frustrating, and that the fix would be quicker to manually input that drives me crazy. I fear that all the "rules" I add to claude.md is wasting my available tokens it won't have enough room to process my request.

➕ show 1 reply

julian37 • yesterday at 5:21 PM

Use planning+execution rather than one-shotting, it'll let you push back on stuff like this. I recommend brainstorming everything with https://github.com/obra/superpowers, at least to start with.

Then work on making sure the LLM has all the info it needs. In this example it sounds like perhaps your hypothetical data model would need to be better typed and/or documented.

But yeah as of today it won't pick up on smells as you do, at least not without extra skills/prompting. You'll find that comforting or annoying depending on where you stand...

hombre_fatal • yesterday at 6:20 PM

Always start an implementation in Claude Code plan mode. It's much more comprehensive than going straight to impl. I never read their prompt for plan mode before, but it deep-dives the code, peripheral files, callsites, documentation, existing tests, etc.

You get a better solution but also a plan file that you can review. And, also important, have another agent review. I've found that Codex is really good at reviewing plans.

I have an AGENTS.md prompt that explains that plan file review involves ranking the top findings by severity, explaining the impact, and recommending a fix to each one. And finally recommend a simpler directional pivot if one exists for the plan.

So, start the plan in Claude Code, type "Review this plan: <path>" in Codex (or another Claude Code agent), and cycle the findings back into Claude Code to refine the plan. When the plan is updated, write "Plan updated" to the reviewer agent.

You should get much better results with this capable of much better arch-level changes rather than narrow topical solutions.

If that's still not working sufficiently for you, maybe you could use more support, like a type-system and more goals in AGENTS.md?

➕ show 1 reply

dpkirchner • yesterday at 5:05 PM

Not the person you're replying to but yes, sometimes I do tell the agent to remove the cruft. Then I back up a few messages in the context and reword my request. Instead of just saying "fix this crash", or whatever, I say "this is crashing because the string is empty, however it shouldn't be empty, figure out why it's empty". And I might have it add some tests to ensure that whatever code is not returning/passing along empty strings.

echelon • yesterday at 5:53 PM

1. I'm working in Rust, so it's a very safe and low-defect language. I suspect that has a tremendous amount to do with my successes. "nulls" (Option<T>) and "errors" (Result<T,E>) must be handled, and the AST encodes a tremendous amount about the state, flow, and how to deal with things. I do not feel as comfortable with Claude Code's TypeScript and React outputs - they do work, but it can be much more imprecise. And I only trust it with greenfield Python, editing existing Python code has been sloppy. The Rust experience is downright magical.

2. I architecturally describe every change I want made. I don't leave it up to the LLM to guess. My prompts might be overkill, but they result in 70-80ish% correctness in one shot. (I haven't measured this, and I'm actually curious.) I'll paste in file paths, method names, struct definitions and ask Claude for concrete changes. I'll expand "plumb foo field through the query and API layers" into as much detail as necessary. My prompts can be several paragraphs in length.

3. I don't attempt an entire change set or PR with a single prompt. I work iteratively as I would naturally work, just at a higher level and with greater and broader scope. You get a sense of what granularity and scope Claude can be effective at after a while.

You can't one shot stuff. You have to work iteratively. A single PR might be multiple round trips of incremental change. It's like being a "film director" or "pair programmer" writing code. I have exacting specifications and directions.

The power is in how fast these changes can be made and how closely they map to your expectations. And also in how little it drains your energy and focus.

This also gives me a chance to code review at every change, which means by the time I review the final PR, I've read the change set multiple times.

➕ show 1 reply

alt Hacker News

Replies