logoalt Hacker News

dezmoulast Thursday at 5:39 PM2 repliesview on HN

I love it, just purchased a pack. I've also found that it is a very great tool to test LLM, like take a screenshot of a half resolved game and feed it to ChatGPT with the rules and ask him to select the next target


Replies

tikotuslast Thursday at 7:04 PM

Thank you so much! Also, you might find this interesting regarding testing LLMs: https://www.nicksypteras.com/blog/cbs-benchmark.html

dezmoulast Thursday at 5:47 PM

turn out Claude Sonnet 4.5 is far better as resolving those as ChatGPT 5.2