First impression: Third-party benchmarks or gtfo. Personally, I've never heard of either of these companies before. We're just supposed to take their word that they've matched the best models on the market?
Sakana describes their model as a "Orchestration Model." Does that mean that it's actually a bunch of different models glued together?
My impression is that the answer is yes, that it purports to dispense the glue on-the-fly in some kind of dynamic way rather than being some kind of new model-amalgam.
See also contemporaneous reaction at:
https://news.ycombinator.com/item?id=48624782 (6 days ago, 244 points, 133 comments)
Their release post was on HN recently. The comments seemed to think that it was similar to OpenRouter, not an actual model.
Did Anthropic give you third-party benchmarks? Is that what you said to them? Yes, they're important, but the attitude is wrong.
Is it actually that hard to make good models or is it just about the amount of resources you have to do training? (This is an actual question, I really don't know.) I'm sure it's not trivial but does it really take world class secret knowledge to build off of the known existing techniques? I feel like there's tons of low hanging fruit still to explore, and time and resources are the limiting factor.