That's... a good result, actually. No, I'm serious. This reads exactly like my inner tho...

TeMPOraL • 12/09/2024 • 1 reply • view on HN

That's... a good result, actually. No, I'm serious.

This reads exactly like my inner thought process on a novel or tricky task I'm asked to solve, especially when I know I'm tired (or drunk, back in the times I consumed alcohol on a regular basis), and need to spell everything out (out loud or in a text file).

Hell, it's exactly how I expect a kid who just learned about fractions would think. I have a vague recollection I processed such tasks this explicitly as a kid, until I understood the topic.

LLMs pulling this off reliably? That's huge progress. I used to say[0] that GPT-4 is best imagined as a 4 year old kid that memorized half the Internet. But this? This is 8 year old's stuff.

[0] - I currently prefer comparing it to "inner voice", and its performance and propensity to hallucinations to a smart schoolkid that's being asked questions by the teacher about things they only read about but didn't fully process, and who's pressured into giving some answer, as saying "I don't know" is an instant F and public humiliation. Such kid will be forced to extrapolate on the spot, but if they're smart enough and remember enough, they'll often get it at least partially right. I know that from personal experience :).

Replies

caturopath • 12/10/2024

Yeah, I don't know if someone thought this was bad or something, but it seems like valid reasoning. We may need to give these models a sense of things that require more and less detailed reasoning and better knowledge of what they know internally, but the "research work" the model claims to be, it seems like it's doing a good job.

The poster also shared in a comment https://preview.redd.it/u8vs29hq5w2e1.png?width=2704&format=... which did get the intended laugh out of me, but even that seems fair enough. I'm currently traveling in a country where most people speak a language I don't know well. You better believe I've been thinking through even trivial greetings, considering the setting, formality, appropriate follow ups, etc.

➕ show 3 replies

alt Hacker News

Replies