logoalt Hacker News

nwienertyesterday at 4:01 PM0 repliesview on HN

Minimax is nowhere near Opus in my tests, though for me at least oddly 4.6 felt worse than 4.5. I haven't use Minimax extensively, but I have an API driven test suite for a product and even Sonnet 4.6 outperforms it in my testing unless something changed in the last month.

One example is I have a multi-stage distillation/knowledge extraction script for taking a Discord channel and answering questions. I have a hardcoded 5k message test set where I set up 20 questions myself based on analyzing it.

In my harness Minimax wasn't even getting half of them right, whereas Sonnet was 100%. Granted this isn't code, but my usage on pi felt about the same.