logoalt Hacker News

nfw2last Friday at 1:17 AM2 repliesview on HN

Latest reasoning models don't claim 2 + 2 = 55, and it's hard to find them making an sort of obviously false claims, or not admitting to being mistaken if you point out that they are


Replies

taorminalast Friday at 3:25 AM

I can’t go a full a full conversation without obviously false claims. They will insist you are correct and that your correction is completely correct despite that also being wrong.

show 1 reply
citizenpaullast Friday at 9:55 PM

It was clearly a simplified example, like I said endless bikeshed.

Here is a real one. I was using the much lauded new Gemini 3? last week and wanted it to do something a slightly specific way for reasons. I told it specifically and added it to the instructions. DO NOT USE FUNCTION ABC.

It immediately used FUNCTION ABC. I asked it to read back its instructions to me. It confirmed what I put there. So I asked it again to change it to another function. It told me that FUNCTION ABC was not in the code, even though it was clearly right there in the code.

I did a bit more prodding and it adamantly insisted that the code it generated did not exist, again and again and again. Yes I tried reversing to USE FUNCTION XYZ. Still wanted to use ABC