I think you severely overestimate your understanding of how these systems work. We’ve been beating the dead horse of “next character approximation” for the last 5 years in these comments. Global maxima would have been reached long ago if that’s all there was to it.
Play around with some frontier models, you’ll be pleasantly surprised.
Did I miss a fundamental shift in how LLMs work?
Until they change that fundamental piece, they are literally that: programs that use math to determine the most likely next token.