For a loosely similar 'benchmark', I recently tried to test major LLMs on my coding game (...

levmiseri • yesterday at 11:53 PM • 0 replies • view on HN

For a loosely similar 'benchmark', I recently tried to test major LLMs on my coding game (models write code controlling their units in a 1v1 RTS) - https://yare.io/ai-arena

alt Hacker News