I'm reminded, once again, of the recent "vibe coded" OCaml fiasco[1]. The PR author...

Alupis • today at 6:35 AM • 7 replies • view on HN

I'm reminded, once again, of the recent "vibe coded" OCaml fiasco[1].

The PR author had zero understanding why their entirely LLM-generated contribution was viewed so suspiciously.

The article validates a significant point: it is one thing to have passing tests and be able to produce output that resembles correctness - however it's something entirely different for that output to be good and maintainable.

[1] https://github.com/ocaml/ocaml/pull/14369

Replies

Incipient • today at 8:10 AM

>Here's my question: why did the files that you submitted name Mark Shinwell as the author?

>Beats me. AI decided to do so and I didn't question it.

Haha that's comedy gold, and honestly a good interview screening situation - you'd instantly pass on the candidate!

➕ show 1 reply

simgt • today at 8:32 AM

I'm humbled by the maintainer's answer [0]. Must be great to work with people like him who have infinite patience and composure.

[0] https://github.com/ocaml/ocaml/pull/14369#issuecomment-35565...

➕ show 3 replies

Culonavirus • today at 7:15 AM

Damn... "AI has a very deep understanding of how this code works. Please challenge me on this." this person is something else. Just... wow.

➕ show 2 replies

tossandthrow • today at 7:17 AM

The Ai legal analysis seemed to be the nail in the coffin.

Adding Ai generated comments are IMHO some of the most rude uses of Ai.

➕ show 2 replies

onion2k • today at 7:21 AM

however it's something entirely different for that output to be good and maintainable

People aren't prompting LLMs to write good, maintainable code though. They're assuming that because we've made a collective assumption that good, maintainable code is the goal then it must also be the goal of an LLM too. That isn't true. LLMs don't care about our goals. They are solving problems in a probabilistic way based on the content of their training data, context, and prompting. Presumably if you take all the code in the world and throw it in mixer what comes out is not our Platonic ideal of the best possible code, but actually something more like a Lovecraftian horror that happens to get the right output. This is quite positive because it shows that with better prompting+context+training we might actually be able to guide an LLM to know what good and bad looks like (based on the fact that we know). The future is looking great.

However, we also need to be aware that 'good, maintainable code' is often not what we think is the ideal output of a developer. In businesses everywhere the goal is 'whatever works right now, and to hell with maintainability'. When a business is 3 months from failing spending time to write good code that you can continue to work on in 10 years feels like wasted effort. So really, for most code that's written, it doesn't actually need to be good or maintainable. It just needs to work. And if you look at the code that a lot of businesses are running, it doesn't. LLMs are a step forward in just getting stuff to work in the first place.

If we can move to 'bug free' using AI, at the unit level, then AI is useful. Above individual units of code, like logic, architecture, security, etc things still have to come from the developer because AI can't have the context of a complete application yet. When that's ready then we can tackle 'tech debt free' because almost all tech debt lives at that higher level. I don't think we'll get there for a long time.

➕ show 1 reply

63stack • today at 12:38 PM

I get infuriated just from reading that, I wish I had as much patience as the maintainers on that project.

kleiba • today at 7:20 AM

I just read that whole thread and I think the author made the mistake of submitting a 13k loc PR, but other than that - while he gets downvoted to hell on every comment - he's actually acting professionally and politely.

I wouldn't call this a fiasco, it reads to me more that being able to create huge amounts of code - whether the end result works well or not - breaks the traditional model of open source. Small contributions can be verified and the merrit-vs-maintenance-effort can at least be assessed somewhat more realistically.

I have no bones in the "vibe coding sucks" vs "vibe coding rocks" discussion and I reading that thread as an outsider. I cannot help but find the PR author's attitude absolutely okay while the compiler folks are very defensive. I do agree with them that submitting a huge PR request without prior discussion cannot be the way forward. But that's almost orthogonal to the question of whether AI-generated code is or is not of value.

If I were the author, I would probably take my 13k loc proof-of-concept implementation and chop it down into bite-size steps that are easy to digest, and try to get them to get integrated into the compiler successively, with being totally upfront about what the final goal is. You'd need to be ready to accept criticism and requests for change, but it should not be too hard to have your AI of choice incorporate these into your code base.

I think the main mistake of the author was not to use vibe coding, it was to dream up his own personal ideal of a huge feature, and then go ahead and single-handedly implement the whole thing without involving anyone from the actual compiler project. You cannot blame the maintainers for not being crazy about accepting such a huge blob.

➕ show 2 replies

alt Hacker News

Replies