I wish I had as many positive experience as it seems some other HNers have with LLMs. I'm not saying I've had zero positive experiences but the number of negative experiences is so high that it's just super scary.
Yesterday, Thanksgiving, there was a Google Doodle. Clicking the doodle lead to a Gemini prompt for how to plan to have Thanksgiving dinner ready on time. It had a schedule for lots of prep the day before and then a timeline for the day of. It had cooking the dinner rolls and then said something like "take them out and keep them warm" followed by cooking something else in the oven. I asked "How do I keep them warm when something else is cooking in the oven?". It proceeded to give me a revised timeline that contradicted its original timeline and also, made no sense in and of itself. I asked it about the contradiction and the error and it apologized and gave a completely new 3rd timeline that was different than the first 2 and also nonsense. This was Google's Gemini Promotion!
All it really needed to do to my first query was say something like "put a towel over the rolls" and leave it on top of the oven.... Maybe? But then, it had told me be spread butter over the rolls as soon as they came out of the oven so I'd have asked, "won't the towel suck up all the butter?"
This is one example many times LLMs fail me (ChatGPT, Gemini). For direct code gen, my subjective experience is it fails 5 of 6 times. For stackoverflow type questions it succeeds 5 of 6 times. For non-code questions it depends on the type of question. But, when it fails it fails so badly that I'm somewhat surprised it ever works.
And yea, the whole world is running head first into massive LLM usage like this one using it for short reviews of authors. Ugh!!!
You're not supposed to look so closely!
It seems to me, most LLM fans are impressed by glancing at a result ("It works!") and never really think about the flaws of the answer or look at the code in detail.
> I'm not saying I've had zero positive experiences but the number of negative experiences is so high that it's just super scary.
Just for shits and giggles I decided to let Copilot (whatever the default in vscode is) write a Makefile for a simple avr-gcc project. I can't remember what the prompt I gave it was, but it was something along the lines of "given this makefile that is old but works, write a new makefile for this project that doesn't have one" and a link to a simple gist I wrote years ago.
Fuuuuuuuuck me.
It's 2500 lines long. It's not just bigger than the codebase it's supposed to build, it's just about bigger than all the C files in all the avr-gcc projects in that entire chunk of my ~/devel/ directory. I couldn't even begin to make sense of what it's trying to do.
It looks mostly like it's declaring variables over and over, concatenating more bits on as it goes. I don't know for sure though.
I won't be using it.
It's truly remarkable that Google put an absurd Wrong Answers Only generator in front of their primary cash cow 18 months ago, and in that time their share price has nearly doubled.
It's wrong nearly every time I search for anything. Ironically, in writing this comment, I tried asking it for the GOOG share price the day before AI Overviews launched, and it got that wrong too.