logoalt Hacker News

windextoday at 5:58 AM0 repliesview on HN

What I do is i ask claude or codex to run models on ollama and test them sequentially on a bunch of tasks and rate the outputs. 30 minutes later I have a fit. It even tested the abliterated models.