logoalt Hacker News

pton_xdtoday at 6:14 PM1 replyview on HN

"in this paper we primarily evaluate the LLM itself without external tool calls."

Maybe this is a factor?


Replies

simianwordstoday at 6:42 PM

No tools were used.

show 1 reply