And therefore it scores worse on benchmarks?

margalabargala • today at 2:48 PM • 2 replies • view on HN

XCSme • today at 3:11 PM

Also Claude/Fable models are quite bad at instructions following: https://artificialanalysis.ai/evaluations/ifbench

XCSme • today at 3:08 PM

On some it does yes, also in real usage.

It avoided answering 2/21 tests in this specific benchmark mark, that's already 90% max score already.

➕ show 1 reply

alt Hacker News