logoalt Hacker News

ACCount37today at 2:27 PM0 repliesview on HN

You can train them in a very similar way.

Modern LLMs often start at "imitation learning" pre-training on web-scale data and continue with RLVR for specific verifiable tasks like coding. You can pre-train a chess engine transformer on human or engine chess parties, "imitation learning" mode, and then add RL against other engines or as self-play - to anneal the deficiencies and improve performance.

This was used for a few different game engines in practice. Probably not worth it for chess unless you explicitly want humanlike moves, but games with wider state and things like incomplete information benefit from the early "imitation learning" regime getting them into the envelope fast.