logoalt Hacker News

maccardtoday at 4:56 PM13 repliesview on HN

> try it, you might actually be amazed.

I keep being told this and the tools keep falling at the first hurdle. This morning I asked Claude to use a library to load a toml file in .net and print a value. It immediately explained how it was an easy file format to parse and didn’t need a library. I undid, went back to plan mode and it picked a library, added it and claimed it was done. Except the code didn’t compile.

Three iterations later of trying to get Claude to make it compile (it changed random lines around the clear problematic line) I fixed it by following the example in the readme, and told Claude.

I then asked Claude to parse the rest of the toml file, whereby it blew away the compile fix I had made..

This isn’t an isolated experience - I hit these fundamental blocking issues with pretty much every attempt to use these tools that isn’t “implement a web page”, and even when it does that it’s not long before it gets tangled up in something or other…


Replies

krastanovtoday at 5:04 PM

This is fascinating to me. I completely believe you and I will not bother you with all the common "but did you try to tell it this or that" responses, but this is such a different experience from mine. I did the exact same task with claude in the Julia language last week, and everything worked perfectly. I am now in the habit of adding "keep it simple, use only public interfaces, do not use internals, be elegant and extremely minimal in your changes" to all my requests or SKILL.md or AGENTS.md files (because of the occasional failure like the one you described). But generally speaking, such complete failures have been so very rare for me, that it is amazing to see that others have had such a completely different experience.

show 3 replies
linsomniactoday at 7:16 PM

My friend, with all due respect, I don't think this is a problem with the AI.

I don't know anything about DotNet, but I just fired up Claude Code in an empty directory and asked it to create an example dotnet program using the Tomlyn library, it chugged away and ~5 minutes later I did a "grep Deserialize *" in the project and it came up with exactly the line you wanted (in your comment here) it to produce: var model = TomlSerializer.Deserialize<TomlTable>(tomlContent)!;

The full results of what it produced are at https://github.com/linsomniac/tomlynexample

That includes the prompt I used, which is:

Please create a dotnet sample program that uses the library at https://github.com/xoofx/Tomlyn to parse the TOML file given on the command line. Please only use the Tomlyn library for parsing the TOML file. I don't have any dotnet tooling installed on my system, please let me know what is needed to compile this example when we get there. Please use an agent team consisting of a dotnet expert, a qa expert, a TOML expert a devils advocate and a dotnet on Linux expert.

I can't really comment on the code it produced (as I said, I don't use dotnet, I had to install dotnet on my system to try this), so I can't comment on the approach. 346 lines in Program.cs seems like a lot for an example TOML program, but I know Claude Code tends to do full error checking, etc, and it seems to have a lot of "pretty printing" code.

fxtentacletoday at 6:08 PM

Same here. While LLMs sometimes work surprisingly well, I also encounter edge cases where they fail surprisingly badly multiple times per day. My guess is that other people maybe just don't bother to check what the AI says which would cause them to not notify omission errors.

Like when I was trying to find a physical store again with ChatGPT Pro 5.4 and asked it to prepare a list of candidates, but the shop just wasn't in the list, despite GPT claiming it to be exhaustive. When I then found it manually and asked GPT for advice on how I could improve my prompting in the future, it went full "aggressively agreeable" on me with "Excellent question! Now I can see exactly why my searches missed XY - this is a perfect learning opportunity. Here's what went wrong and what was missing: ..." and then 4 sections with 4 subsections each.

It's great to see the AI reflect on how it failed. But it's also kind of painful if you know that it'll forget all of this the moment the text is sent to me and that it will never ever learn from this mistake and do better in the future.

show 3 replies
ffsm8today at 6:47 PM

> This morning I asked Claude to use a library to load a toml file in .net and print a value.

Legit this morning Claude was essentially unusable for me

I could explicitly state things it should adjust and it wouldn't do it.

Not even after specifying again, reverting everything eventually and reprompt from the beginning etc. Even super trivial frontend things like "extract [code] into a separate component"

After 30 minutes of that I relented and went on to read a book. After lunch I tried again and it's intelligence was back to normal

It's so uncanny to experience how much ita performance changes - I strongly suspect anthropic is doing something whenever it's intelligence drops so much, esp. Because it's always temporary - but repeatable across sessions if occuring... Until it's normal again

But ultimately just speculation, I'm just a user after all

show 1 reply
DougN7today at 5:18 PM

I have similar experiences. It has worked about half the time, but the code has to be pretty simple. I’ve many experiences, where we work on something complicated for an hour, and it is good compiling code, but then an edge case comes to mind that I ask about and Claude tells me the whole approach is doomed and will never work for that case. It has even apologized a few times for misleading me :) I feel like it’s this weird mix of brilliant moron. But yeah, ask for a simple HTML page with a few fields and it rocks.

foobarchutoday at 6:17 PM

The best experiences I have are those where I can describe what I want done with details. Rather than asking it add toml parsing, I would tell it to exactly which library to use ahead of time and reduce the number of decisions available to the model to make. Some of the most effective use-cases are when you have a reference to give it, e.g. "add x feature the same way as in this other project that is also in the workspace", or "make the changes I made to the contents of directory X in git commit <sha here>, but applied to directory Y instead". In both cases it's a lot of copy/paste then tweaking an obvious value (like replacing "dev" with "QA" everywhere).

I try to give the model as little freedom as possible. That usually means it's not being used for novel work.

show 2 replies
ball_of_linttoday at 7:14 PM

Did you give claude access to run the compile step?

I remember having to write code on paper for my CS exams, and they expected it to compile! It was hard but I mostly got there. definitely made a few small mistakes though

show 1 reply
phromotoday at 5:58 PM

I don't know why but I find performance on c#/.net be several generations behind. Sometimes right ofc but my general experience is if you pull the generation slot machine in just about any other language it will work better. I regularly do python, typescript, ruby and rust with a better experience. It's even hard to find benchmarks where csharp is included.

wrstoday at 5:18 PM

I’m honestly baffled by this. I don’t want to tell you “you’re holding it wrong” but if this is your normal experience there’s something weird happening.

Friday afternoon I made a new directory and told Claude Code I wanted to make a Go proxy so I could have a request/callback HTTP API for a 3rd party service whose official API is only persistent websocket connections. I had it read the service’s API docs, engage in some back and forth to establish the architecture and library choices, and save out a phased implementation plan in plan mode. It implemented it in four phases with passing tests for each, then did live tests against the service in which it debugged its protocol mistakes using curl. Finally I had it do two rounds of code review with fresh context, and it fixed a race condition and made a few things cleaner. Total time, two hours.

I have noticed some people I work with have more trouble, and my vague intuition is it happens when they give Claude too much autonomy. It works better when you tell it what to do, rather than letting it decide. That can be at a pretty high level, though. Basically reduce the problem to a set of well-established subproblems that it’s familiar with. Same as you’d do with a junior developer, really.

show 3 replies
JakeStonetoday at 6:04 PM

If Claude ends up grabbing my C# TOML library, in my defense, I wrote it when the TOML format first came out over a dozen years ago, and never did anything more with it Sorry.

show 1 reply
senordevnyctoday at 7:42 PM

This is so perplexing to me. I've definitely hit these kinds of issues (which usually result in me cursing at the agent in all caps while telling it to get its shit together!), but it's almost always a long ways into a session where I know context rot is an issue, and the assumption it's making is a dumb one but it's also in the middle of a complex task...I just haven't had anything remotely like the situation you're describing, where Opus 4.6 can't make a simple change and verify that it compiles, can't look up docs, can't follow your instructions, etc. Bizarre.

roncesvallestoday at 5:25 PM

Did you use the best model available to you (Opus 4.6)? There is a world of difference between using the highest model vs the fast one. The fast ones are basically useless and it's a shame that all these tools default to it.

show 2 replies