pretty decent article - but what it misses is most of these agents are trained on bad code - which i...

dzonga • last Monday at 10:37 PM • 2 replies • view on HN

pretty decent article - but what it misses is most of these agents are trained on bad code - which is open source.

so what does this mean in practice? for people working on proprietary systems (cost will never go down) - the code is not on github, maybe hosted on an internal VCS - bitbucket etc. the agents were never trained on that code - yeah they might help with docs (but are they using the latest docs?)

for others - the agents spit bad code, make assumptions that don't exist, call api's that don't exist or have been deprecated ?

each of those you need an experienced builder who has 1. technical know-how 2. domain expertise ? so has the cost of experienced builder(s) gone down ? I don't think so - I think it has gone up

what people are vibecoding out there - is mostly tools / apps that deal in closed systems (never really interact with the outside world), scripts were ai can infer based on what was done before etc but are these people building anything new ?

I have also noticed there's a huge conflation with regards to - cost & complexity. zirp drove people to build software on very complex abstractions eg kubernetes, nextjs, microservices etc - hence people thought they needed huge armies of people etc. however we also know the inverse is true that most software can be built by teams of 1-3 people. we have countless proof of this.

so people think to reduce cost is to use a.i agents instead of addressing the problem head-on - built software in a simpler manner. will ai help - yeah but not to the extent of what is being sold or written daily.

Replies

Tepix • last Monday at 10:48 PM

> these agents are trained on bad code - which is open source.

This is doubtful and not what I've seen in over 30 years in the industry. People who are ashamed of their source code don't make it Open Source. In general, Open Source will be higher quality than closed source.

Sure, these days you will need to avoid github repositories made by students for their homework assignments. I don't think that's a problem.

simonw • last Monday at 10:41 PM

The idea that LLMs were trained on miscellaneous scraped low quality code may have been true a year ago, but I suspect it is no longer true today

All of the major model vendors are competing on how well their models can code. The key to getting better code out of the model is improving the quality of the code that it is trained on.

Filtering training data for high quality code is easier than filtering for high quality data if other types.

My strong hunch is that the quality of code being used to train current frontier models is way higher than it was a year ago.

alt Hacker News

Replies