I agree and put it this way: LLMs sound so convincing presenting you the work it does rose colored and promising to give you more if you keep going.
There is a 50/50 chance that it turns out to be right or letting you jump of the cliff.
Only the trip stays the same beautiful 5 star plus travel.
Also, spotting an error and telling LLM makes it in most cases worse, because the LLM wants to please you and goes on to apologize and change course.
The moment I find myself in such a situation I save or cancel the session and start from scratch in most cases or pivot with drastic measures.
Gemini to me is the most unpredictable LLM while GPT works best overall for me.
Gemini lately gave me two different answers to the same question. This was an intentional test because I was bored and wanted to see what happens if you simply open a new chat and paste the same prompt everything else being the same.
Reasoning doesn’t help much in the Coding domain for me because it is very high level and formally right what the LLM comes up with as an explanation.
I google more due to LLMs than before, because essentially what I witnessed is someone producing something that I gotta control first before I hit the button that it comes with. However, you only find out shortly afterwards whether the polished button started working or gave you a warm welcome to hell.
>LLM wants to please you
I was using Copilot and asked it a question about a PDF file (a concept search). It turned out the file was images of text. I was anticipating that and had the text ready to paste in.
Instead, it started writing an OCR program in python.
I stopped it after several minutes.
Often Copilot says it can't do something (sometimes it's even correct), that's preferential to the try-hard behaviour here.
> Gemini to me is the most unpredictable LLM while GPT works best overall for me.
This nails an important thing IMHO. I've absolutely noticed this, for better or worse. Gemini can produce surprisingly excellent things, but it's unpredictability make me go for GPT when I only want to ask it once.
Reusing the same prompt several times is something I've started doing too. The contrast is often illuminating.
In one case, it made a thoroughly convincing argument that an approach was justified. The second time it made exactly the opposite argument, which was equally compelling.
I now see LLMs as persuasion machines.