Just yesterday I had a moment
Claude's code in a conversation said - “Yes. I just looked at tag names and sorted them by gut feeling into buckets. No systematic reasoning behind it.”
It has gut feelings now? I confronted for a minute - but pulled out. I walked away from my desk for an hour to not get pulled into the AInsanity.
>It has gut feelings now?
I would say hard no. It doesn't. But it's been trained on humans saying that in explaining their behavior, so that is "reasonable" text to generate and spit out at you. It has no concept of the idea that a human-serving language model should not be saying it to a human because it's not a useful answer. It doesn't know that it's not a useful answer. It knows that based on the language its been trained on that's a "reasonable" (in terms of matrix math, not actual reasoning) response.
Way too many people think that it's really thinking and I don't think that most of them are. My abstract understanding is that they're basically still upjumped Markov chains.
It has a lot. I find by challenging it often, getting it to explain it's assumptions, it's usually guessing.
This can be overcome by continuously asking it to justify everything, but even then...