logoalt Hacker News

tibbaryesterday at 7:58 PM0 repliesview on HN

LLMs often seem to have trouble determining the severity of a bug/incident/problem in a vacuum. If you run an LLM over 1000 items in parallel and ask "is this bad," it will come up with reasons for it to be bad way more than it might if it were considering all 1000 at the same time.