How do you qualify what makes a model "Mythos class", and how do you reliably test for it?

cassianoleal • today at 4:32 PM • 1 reply • view on HN

generalizations • today at 4:50 PM

Presumably a deepswe benchmark, which IIRC puts GLM 5.2 between opus 4.8 and fable.

alt Hacker News