> This is the garbage in, garbage out principle in action. The utility of a model is bottlenecked by its inputs. The more garbage you have, the more likely hallucinations will occur.
Good read but I wouldn't fully extend the garbage in, garbage out principle to the LLMs. These massive LLMs are trained on internet-scale data, which includes a significant amount of garbage, and still do pretty good. Hallucinations are due to missing or misleading context than from the noise alone. Tech debt heavy code bases though unstructured still provides information-rich context.