No-one should have the expectation LLMs are giving correct answers 100% of the time. It's inherent to the tech for them to be confidently wrong
Code needs to be checked
References need to be checked
Any facts or claims need to be checked
"confidently" is a feature selected in the system prompt.
As a user you can influence that behavior.
According to the benchmarks here they're claiming up to 97% accuracy. That ought to be good enough to trust them right?
Or maybe these benchmarks are all wrong