> We write unit tests for the happy path, maybe a few edge cases we can imagine, but what about the inputs we'd never consider? Many times we assume that LLMs are handling these scenarios by default,
Do we?
I've seen companies advertise with LLM generated claims (~Best company for X according to ChatGPT), I've seen (political) discussions being held with LLM opinions as "evidence".
So it's pretty safe to say some (many?) attribute inappropriate credence to LLM outputs. It's eating our minds.
What’s interesting to me about this, reckless as it is, is that the conversation has begun to shift toward balancing LLMs with rigorous methods. These people seem to be selling some kind of AI hype product backed by shoddy engineering, and even they are picking up on the vibe. I think this is a really promising sign for the future.
When "we" = "developers we imagined when using LLMs to generate this marketing slop based on a contrived scenario", then sure!
The original claim for TDD is your write tests for all your edge cases. It doesn't matter about inputs you didn't consider because they are covered in the edge. If you can only accept inputs from 2-7 (inclusive) you check 1,2,7,8 - if those pass you assume the rest work.