It's generally one-shot-only - whatever comes out the first time is what I go with. I've...

simonw • yesterday at 8:12 PM • 2 replies • view on HN

It's generally one-shot-only - whatever comes out the first time is what I go with.

I've been contemplating a more fair version where each model gets 3-5 attempts and then can select which rendered image is "best".

irthomasthomas • yesterday at 8:19 PM

Try llm-consortium with --judging-method rank

andriy_koval • yesterday at 8:14 PM

I think it will make results way better and more representative of model abilities..

➕ show 1 reply

alt Hacker News