logoalt Hacker News

RA_Fishertoday at 11:21 AM3 repliesview on HN

I don’t think there are other models near Fable’s capabilities.


Replies

HarHarVeryFunnytoday at 1:36 PM

That remains to be seen.

It's notable that Anthropic are still using SWEBench as a coding benchmark rather than the newer more difficult DeepSWE which shows them well behind GPT 5.5

https://deepswe.datacurve.ai/

Bear in mind that all the marketing efforts such as solving Erdos problem are the result of concerted RL training to impart those narrow capabilities, and how much of any benchmark results, or "early access" paid shill vibe reports, reflect improved performance for more general real-world use cases remains to be seen.

fc417fc802today at 12:18 PM

For how long though? The past two months have seen a ridiculous number of model releases.

ImPostingOnHNtoday at 1:03 PM

Why don't you think that? What I've read is that other models can find the same bugs.