It makes perfect sense to use human times as a baseline. Because otherwise, the test would be biased...

ACCount37 • today at 6:46 AM • 0 replies • view on HN

It makes perfect sense to use human times as a baseline. Because otherwise, the test would be biased towards models with slower inference.

If model A generates 10 tokens a second and model B generates 100 tokens a second, then using real LLM inference time puts A at a massive 10x advantage, all other things equal.

alt Hacker News