logoalt Hacker News

simonwyesterday at 8:12 PM2 repliesview on HN

It's generally one-shot-only - whatever comes out the first time is what I go with.

I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best".


Replies

irthomasthomasyesterday at 8:19 PM

Try llm-consortium with --judging-method rank

andriy_kovalyesterday at 8:14 PM

I think it will make results way better and more representative of model abilities..

show 1 reply