> It also performs poorly in a chat without tools, exhibiting an ehthusiasm for hallucination. I’...

NitpickLawyer • last Monday at 7:56 PM • 3 replies • view on HN

> It also performs poorly in a chat without tools, exhibiting an ehthusiasm for hallucination. I’m currently working on a replication of this with full tool access, including bash/Python, which may allow this model to be competitive.

How is that a serious phrase in '26? I mean I have no idea if this fine-tune is good, haven't tried it, but testing a (clearly) agentic model without tool access and expecting it to work is crazy, no? What was he even testing?!

Replies

nodja • last Monday at 8:47 PM

Last thing you want a model to do is hallucinate a tool call and it's outputs...

vikingcat • last Monday at 8:10 PM

Maybe expecting it to recognize it's limitation without tools instead of hallucinate. But yeah, not wholly useful. It's performance (and proclivity to hallucinations) with tools is what really matters.

reactordev • last Monday at 9:07 PM

Visual Inspection Before Execution… it’s all vibe…

alt Hacker News

Replies