logoalt Hacker News

margalabargalatoday at 2:48 PM2 repliesview on HN

And therefore it scores worse on benchmarks?


Replies

XCSmetoday at 3:11 PM

Also Claude/Fable models are quite bad at instructions following: https://artificialanalysis.ai/evaluations/ifbench

XCSmetoday at 3:08 PM

On some it does yes, also in real usage.

It avoided answering 2/21 tests in this specific benchmark mark, that's already 90% max score already.

show 1 reply