> It is very much like playing an instrument.
Or it is more like playing a slot machine and you imagine the rest.
A poor analogy depending on the setting because you can't adjust the odds with a slot machine, and the ROI is negative by design. If that's your experience, yeah, I wouldn't use an LLM either.
Instruments are pseudo-random until you know what you're doing. Slot machines are just slot machines
It is a bit of both. A non-deterministic instrument and a predictable slot machine.
This is how I feel whenever I see bold all caps instructions in a system prompt or someone claims they conducted "research" and found the magic prompt template that makes the model pay out.
Maybe it works some of the time but it isn't a solution that works everytime.
It reminds me of people hovering to play a slot machine when someone gets up and it hasn't paid out as if they've solved slot machines.
While I don't mind putting something in a loop until the tests pass, I'm less comfortable doing that when providers are silently rerouting to lower quality models, or in Google's case burning quota faster to ease their own server load without being transparent about what the "standard limits" are to begin with. [1]
I'm hopeful I'll be more comfortable with these "slot machines" when frontier models get to the point where they can be run locally on hardware I can actually afford so I know exactly what I'm getting and not jumping at shadows with providers playing tricks behind the scenes to ease their own load without admitting the customer is getting less for their money as they get more popular.
[1]: https://support.google.com/gemini/answer/16275805?hl=en&sjid...