logoalt Hacker News

amlutoyesterday at 5:39 PM1 replyview on HN

That blog post really makes it look like it's graded from an LLM's estimation of an OSS maintainer's review. I see three issues:

1. That estimate could easily be wrong.

2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.

3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.


Replies

rdedevtoday at 12:48 AM

There is also the possibility that an LLM judge would be happy with some code that looks like LLM generated code. But a maintainer for a specific project might not merge it for stylistic reasons

show 1 reply