logoalt Hacker News

GistNoesisyesterday at 11:34 AM1 replyview on HN

>"just train it until it is perfect"

Yes that's exactly the problem with current approach based on a "valuation" function.

They are not trying to aim for perfection, and therefore cannot make progress anymore.

To progress you must precisely define what is the frontier : an evaluation of 0.1 is not resolved to one of "white win", "draw", "white lose" which they theoretically must be. They are not "committing" to anything.

To train such a network to perfection you must avoid training your neural network for the "average" game state, but rather also train for "hard mining samples", game states which define the frontier.

Find a candidate, find a violation, add to dataset of training examples, Retrain to perfection on a growing dataset, (or a generator of hard positions) to find a new candidate and Loop.


Replies

WJWyesterday at 2:35 PM

So what makes you think it is possible to precisely define such a frontier? And why should such a thing, if it is possible at all, be 1. doable by humans and 2. doable with the amount of energy and computing power available to us within the coming couple of decades?