TFA's while point is that there is no easy way to tell if LLM output is correct or not. Driving mistakes provide instant feedback if the output of whatever AI is driving is correct or not. Bad comparison.
Many of the things that LLMs will output can be validated in a feedback loop, e.g., programming. It's easy to validate the generated code with a compiler, unit tests, etc. LLMs will excel in processes that can provide a validating feedback loop.
Many of the things that LLMs will output can be validated in a feedback loop, e.g., programming. It's easy to validate the generated code with a compiler, unit tests, etc. LLMs will excel in processes that can provide a validating feedback loop.