Author |
Topic: automated reversal detection in game history (Read 4242 times) |
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
automated reversal detection in game history
« on: Mar 24th, 2008, 9:17pm » |
Quote Modify
|
I have a stored database of all boards that have existed in the game (at least through the last time I ran the updater), meaning every step of every turn of every game saved independently as a board description. I am mining this database to work on the eval function of my bot. What would be nice to have (but is of course impossible) is some a priori measure of which player is winning -- and ideally by how much -- that could be used to train an eval function to model that variable. Obviously, however, this can't be done, since that's what an eval function is in the first place: a measure of who is ahead. But what I'm wondering is to what extent one can look at a database of board positions and get such an estimate of who is ahead. If you just randomly assume one player is ahead, you will be right 50% of the time. If you can assume the player who won the game is ahead, you will be right more than 50% of the time. In fact, you would be right 100% of the time if it were not through reversals of fortune - the balance of power shifted mid-game. So is there some way to iteratively find reversals of fortune through an automated process? Consider this sequence: 1) Apply a naive evaluator to all boards. Say, a simple material evaluator. 2) Find points in games where the measure of who is ahead according to that evaluator suddenly changed. (A capture happened!) 3) Assume that means the player who was ahead after the apparent reversal was in fact actually ahead one ply earlier. (they set up the capture through superior play, or the other player blundered) 4) Train some new arbitrary evaluator function (think a complex curve fit/regression) to maximally model this new assumption. 5) Using this new function as the "naive evaluator", repeat the process until the fraction of boards which can be categorized no longer changes between iterations. I see a whole host of problems with that approach, but something about it rings worthwhile to me. Here's another only half-automated approach: 1) Eval all boards with the best eval you've got 2) Identify reversal points 3) Feed these apparent reversal points to humans experts who can look at the board and the previous few plies to determine if they agree with the eval's opinion of the reversal of fortunes, and if so, they can identify on which turn they feel the reversal actually occured. 4) Use the aggregate of the human opinions to categorize these corner cases, and combine that with the results of the naive eval to build a table that categorizes who is winning at all points in history. 5) Use that table as the training set for a more complex eval function. So, I just wanted to see if this triggered any thoughts for people. Can you come up with other ways, theoretically, of determining who was ahead at any particular point in a game?
|
|
IP Logged |
|
|
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: automated reversal detection in game history
« Reply #1 on: Mar 24th, 2008, 9:37pm » |
Quote Modify
|
This seems very similiar, if not equivalent, to the "credit assignment problem" (i.e. how do you decide to parcel out the credit to a sequence of decisions when there is only a consequence given at the end of the sequence). It is something comes up and must be dealt with in many unsupervised learning situations. Janzert
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: automated reversal detection in game history
« Reply #2 on: Mar 24th, 2008, 10:25pm » |
Quote Modify
|
I don't know anything about automated learning, but it seems that you would need at least a move generator to help you. You don't want your evaluation function to be trained to think it is a "reversal" just because the opponent found the best move. A reversal should only be the delta between the best move and the move actually played.
|
|
IP Logged |
|
|
|
nbarriga
Forum Guru
Almost retired Bot Developer
Gender:
Posts: 119
|
|
Re: automated reversal detection in game history
« Reply #3 on: Mar 25th, 2008, 12:09am » |
Quote Modify
|
Why don't use something like TD(lambda)? Haizhi and I already failed at doing so, but maybe third time's a charm(I think that's the expression, isn't it?). There's a small introduction in the wikipedia: http://en.wikipedia.org/wiki/Temporal_difference_learning
|
|
IP Logged |
|
|
|
IdahoEv
Forum Guru
Arimaa player #1753
Gender:
Posts: 405
|
|
Re: automated reversal detection in game history
« Reply #4 on: Mar 25th, 2008, 1:21am » |
Quote Modify
|
on Mar 24th, 2008, 10:25pm, Fritzlein wrote:I don't know anything about automated learning, but it seems that you would need at least a move generator to help you. You don't want your evaluation function to be trained to think it is a "reversal" just because the opponent found the best move. A reversal should only be the delta between the best move and the move actually played. |
| I'm not understanding your reply, Fritz, and I think you may be using the word "reversal" different than I am in this instance. To clarify, I am looking statically at boards (moves not relevant) and, assuming there is some "true" measure of who is ahead/winning, doing my best to estimate that measure. "reversal" in my meaning is a state where, for example, on some ply X, Gold is winning, but on ply X+1, Silver is winning. It doesn't matter who moved, what they moved, or what the best move was ... only a static measure of who is ahead. Being able to identify those locations a priori would produce a better training set than "assume in all cases that the eventual winner is currently ahead". That simple assumption is the one I used when empirically training those material eval functions a while back. I think that was fine for eval, but a better way of generating a training set would be preferable when training an actual eval for use in a bot. Especially since the most critical states for the bot's eval to evaluate correctly are those near reversals of fortune. Once one player is indefeatably far ahead (the easy case) it's too late for the bot to make much difference. You may have already completely understood what i was asking, in which case it's just me who didn't quite get your followup.
|
|
IP Logged |
|
|
|
lightvector
Forum Guru
Arimaa player #2543
Gender:
Posts: 197
|
|
Re: automated reversal detection in game history
« Reply #5 on: Mar 25th, 2008, 9:05am » |
Quote Modify
|
The main problem I see with any automated assignment is how to decide whether a swing in advantage was the result of good play or one side blundering. Say Silver loses a horse between positions A and B. If the loss was forced, then ideally you would want to assign a score of a horse loss to position A, because the horse is destined to be lost at that point (possibly as a result of a much earlier mistake by Silver). But what if Silver simply made a blunder, and the horse could easily have been saved at no cost? Then, you would not want to reduce the value of position A, since the true value of position A does not include a horse loss, except that one player made a silly mistake. The problem I see is that it is difficult to distinguish the two cases. In both cases, you have a horse present in A, and gone in B, but the proper values to propagate to position A could be very different. It seems like you need a bot running alongside actually evaluating everything with a search, and if the search result disagrees too much with the actually result, to flag a blunder and not a forced loss/gain. Of course, this has its own problems, and I doubt bots are strong enough to do this well.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: automated reversal detection in game history
« Reply #6 on: Mar 25th, 2008, 10:24am » |
Quote Modify
|
Lightvector expressed my concern more clearly than I did, maybe because I am confused in my own mind. Let me try to expound a bit to see for myself if I have it straight. You want an static evaluation that is predictive of the future. In a game where one side slowly builds up an advantage over the course of the game and then finally wins, you would want the static eval to start at equal and gradually get more favorable for the winning player as the game progresses. The eval function isn't very good if it doesn't see the win coming. For example, if the evaluation of every intermediate position was equal until right before the game ended, then the eval was useless, because it didn't perceive the gradually tipping balance. If the eval thought that the losing player was ahead until right before he lost, then the evaluation function is actually worse than useless, since it thinks a weak position is strong. Based on this philosophy, you can train your evaluation function to start at even and gradually approach a win for the side that acutally won. The trouble with this approach is that most games don't follow a straight line from start to finish. There are often dramatic turnarounds due to blunders. It could be that one player steadily built up a lead throughout the game, but goofed just once and allowed an immediate goal. Using that game to train the evaluation function will send the wrong message. Because the player with the stronger position lost, it will appear that it wasn't actually the stronger position. Therefore it seems that the training process must have a blunder filter. If the evaluation function has a drastically different opinion of a certain game position and the game position one turn later, that doesn't necessarily mean the evaluation function stinks for not foreseeing the future. It could mean the player stinks for not seeing the right move. You can fix this by looking ahead a mere one ply. Your static eval can compare the actual move to the move that it thinks is best, and thereby identify and discard blunders. Therefore it won't train itself on the basis of bad moves. Does that make any sense?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: automated reversal detection in game history
« Reply #7 on: Mar 25th, 2008, 12:06pm » |
Quote Modify
|
OK, now I'm just going to go crazy with naive speculation about automated training. This could be humorous since I have done essentially no reading on the subject. It seems that there is a tradeoff between evaluation and search. With perfect evaluation, only one ply of search is needed: generate all the moves and pick the one with the best eval. On the other extreme, with perfect search, only won/lost eval is needed. Just build out the search tree until one player or the other wins, and bubble up the results by mini-max. We can't do perfect search, so we want to improve our evaluation. To do this we need feedback from a better evaluation. Finding a source of better evaluation is tricky. Humans have better evaluation, but human input is available in very limited quantity and is unreliable. We would rather automate the learning process, and bootstrap a decent evaluation function into a better one. If we can do this the sky is the limit, because the process can be repeated to turn the better evaluation into a still-better evaluation, etc. Search seems like a great way to bootstrap. We assume that eval+search is better than eval alone, and use eval+search to provide feedback for training eval. Naively, it seems that even one ply of search could make this effective, no matter how good eval is by itself. For example, if eval already included a decision tree to find all goals up to four steps, then one-ply-search-plus-eval will train eval to anticipate goals up to eight steps. One-ply-search-plus-new-eval would then train eval to anticipate goals up to twelve steps. And so on recursively. The problem with the otherwise brilliant approach of training eval with eval+search is that if there are any bugs in eval, they will tend to be exaggerated. If eval thinks it is great to lose pieces, then eval+search will train eval to get better at losing pieces, e.g. train eval to dislike situations where the opponent can prevent you from committing suicide, etc. The computer can get caught in its own mind games in positions that are totally irrelevant to playing Arimaa well. A bot can develop superstitions very much like, "I broke a mirror the night before my mother died, so breaking mirrors is bad luck," and then you do anything you can do to avoid breaking mirrors, and since you successfully avoid breaking mirrors, you never get input that contradicts your superstition. There needs to be some sanity check that anchors everything and prevents the training from getting off on an insane tangent. The primary sanity check is, of course, the victory condition. Eval must always recognize wins and losses. This will propagate backwards in training at least as far as search depth. If we use one ply of search then we will at least train eval to see wins one ply further than it could on its own. My prediction is, unfortunately, that anchoring on victory condition only will do nothing more than train the eval to be excessively fond of advanced rabbits, since that is the only thing it can understand. Starting from scratch, your eval will quickly evolve into arimaa_score. This is bad. We can do better than arimaa_score by hand. We are thus tempted to hand-code an eval (definitely one that includes material) and bootstrap from that starting point. But what is our anchor then? We can't anchor the eval to permanently include our assumptions about how much material is worth, because then we will defeat the purpose of the training. Sure, the bot will get better at anticipating material gain/loss ahead of time, but if the piece evaluations are wrong, the bot will just get good at doing the wrong thing. For example it might get trained to excellent precision that a position is favorable if you have a camel capture in two moves that allows your opponent goal in three moves on the other side of the board. On the other hand, neither can we allow the training to modify our material assumptions without allowing it to drift off into superstition. The unanchored training would stand a fair chance of evolving towards something worse than what we hand-coded, or at least adding random noise to our starting point. My completely uninformed hunch is that for automated learning we can't be discouraged and quit if we start from winning conditions and evolve arimaa_score. We have to move forward from there with further automated evolution. We have to let the learner figure out what is important on the next level based on its OWN knowledge of the previous level. We need to bootstrap from loving advanced rabbits, up to loving to block and kill opposing advanced rabbits, up to loving to control enemy traps, etc., up to loving to kill enemy pieces only as a byproduct of the victory conditions at the Nth degree. Unless the evaluation function has evolved to prioritize these things in the correct order, then starting anywhere else and evolving might deteriorate into having to do all that hard work anyway. It might be too much to expect an evolutionary process to improve an eval that wasn't created by that same evolutionary process. I have totally dodged the question of how to generate the training positions to be used for evolution. But on this score we might take a page from great teachers of chess. Good chess instructors hate to waste time on opening study. They want to teach the endgame first, and work backwards from there to the middle game, and take up openings last. My hunch is that learning bots should similarly train first on endgame positions. This means not training on the entire database of human games ever. That will only confuse them with positions they can't understand. If you must use human games, use only positions from the last few ply in every game. Under that restriction on using human games, you will probably need machine-generated positions as well in order to get enough training positions. Self-play of one-ply bots (or self-play between mutated versions) would be a very quick way to get more positions, and even then I think it would be still be essential to only use the positions almost at the end of those games as training positions as well. Train for a million iterations using only the position one ply before goal. When the eval is nearly perfect at spotting one-move goal, start training with positions one and two moves from goal. When that stabilizes, take the training back a third ply, etc. The anti-superstition feature is that your training positions will all be providing feedback near enough to the absolute anchor of win/loss to prevent the eval function's mind from wandering. That was fun to write, but surely hopelessly naive. I look forward to being told what book I need to read before posting again.
|
« Last Edit: Mar 25th, 2008, 1:27pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: automated reversal detection in game history
« Reply #8 on: Mar 25th, 2008, 1:16pm » |
Quote Modify
|
on Mar 25th, 2008, 12:09am, nbarriga wrote:Why don't use something like TD(lambda)? |
| OK, I read a few paragraphs of Tesauro. I guess this clarifies for me what was confusing about my initial post in this thread. IdahoEv wants to train an evaluation function using human moves as a link from one position to the next. I don't trust this link between positions, and would rather replace it with exhaustive search. If I used human games for anything, it would be a source of training positions, not a way to propagate feedback between positions. If I understand Tesauro correctly, I'm proposing using something radically different from TD(lambda), because in TD(lambda) perfect knowledge (i.e. the actual game result) is propagated backwards across imperfect barriers (i.e. the moves actually played). Because each move in an actual game changes who is winning or at least changes the amount by which they are winning, the quality of the signal rapidly deteriorates as you go back a few moves in the game. In my blathering I instead propose propagating imperfect knowledge (minimax of eval on search tree) across a perfect barrier (exhaustive search to some depth). To the depth that you can search, there is no error. All the error comes from doing eval on the nodes. Thus my concern of wanting to train only on positions where search+eval is likely to be pretty darn good. Initially this will be only positions almost at game end; no matter how stupid the eval is, if it at least recognizes the game-ending condition then search+eval is perfect. One hopes that eventually as the eval gets better there will be more and more positions on which search+eval is much better than eval, and therefore useful for training. But unlike TD(lambda), the midgame training isn't obscured by all the moves between game result and midgame position. A possible measure of whether a set of training positions is worth using: How often does search+eval radically disagree with eval? Lots of disagreement is a good sign that training can help.
|
« Last Edit: Mar 25th, 2008, 8:15pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: automated reversal detection in game history
« Reply #10 on: Mar 25th, 2008, 2:24pm » |
Quote Modify
|
on Mar 25th, 2008, 1:20pm, jdb wrote: Available in HTML, even. Very cool!
|
|
IP Logged |
|
|
|
lightvector
Forum Guru
Arimaa player #2543
Gender:
Posts: 197
|
|
Re: automated reversal detection in game history
« Reply #11 on: Mar 25th, 2008, 5:58pm » |
Quote Modify
|
There is another difficulty that has to be considered: the features and inputs you want for the function. For instance, if you only feed your learning evaluation function the fixed inputs of a raw board position, a raw material count, and a game-over flag, even with an arbitrary amount of training, it is quite unlikely that you will produce a particularly strong evaluation function. Depending on how you code it, it will probably be incapable of computing anything too much more complex than a linear combination of its inputs, which doesn't allow for much. Automated learning can be wonderful for optimizing an evaluation function to make the most accurate predictions off of the inputs you already have, but it doesn’t solve the problem of picking the inputs in the first place. And automated feature detection is still quite young and limited despite being a major research topic. It is highly unlikely that a bot working on the inputs to a typical eval function could teach itself to statically detect 4-step goals, without a LOT of extra hand-picked features help it detect different types of situations and configurations of pieces relevant to goaling. You probably either need to hardcode it (the current solution), or build a very well-designed pattern matcher that can perform automated pattern extraction, or else you need an absurdly generalized learning framework that will probably never converge with practical amounts of training. Ideally, you want to train your eval to become a better and better predictor of eval + search, but for any given set of inputs, there is only so much you can do. How is the eval function supposed to tell the difference between a position where the opponent can force a capture in 3 moves, versus a slightly different position where a crucial piece is shifted just so that another piece is frozen or unfrozen, or blocking the way, or just plain interfering with some tactic, preventing the capture? And, say, do correctly in even 10% of the hundreds of possible cases? Of course, you can add more inputs, or add support for pattern matching, or build your learning bot in a very general framework, but as you add more, it will get more complex, take ever longer to train properly, and run slower and slower. Mainly for this reason, I tend to be skeptical of dreams of programs that could start from nothing more than win/loss detection, and become strong. For Arimaa in particular, there's so much nonlinearity involved in how the precise location of different pieces can affect tactics and even global strategy. One piece out of place, and that elephant is no longer blockaded. Shift that trap defender one square over, and suddenly a big capture is possible. Add or remove a single rabbit, or shift a single strong piece one step in any direction during a goal race, and suddenly the outcome of the game swings wildly. I presume this is why games such as Chess and Go, have resisted efforts at automated learning reasonably well, in contrast to games like Backgammon, which are relatively more linear and have seen successful learning programs. Arimaa seems to fall into the former category.
|
« Last Edit: Mar 25th, 2008, 5:59pm by lightvector » |
IP Logged |
|
|
|
nbarriga
Forum Guru
Almost retired Bot Developer
Gender:
Posts: 119
|
|
Re: automated reversal detection in game history
« Reply #12 on: Mar 25th, 2008, 7:32pm » |
Quote Modify
|
on Mar 25th, 2008, 5:58pm, lightvector wrote:I presume this is why games such as Chess and Go, have resisted efforts at automated learning reasonably well, in contrast to games like Backgammon, which are relatively more linear and have seen successful learning programs. Arimaa seems to fall into the former category. |
| Actually I think there have been some moderate success in attempts at using neural networks as an eval function(ex. see neuroGo). I have high confidence that it can work quite well for arimaa.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: automated reversal detection in game history
« Reply #13 on: Mar 25th, 2008, 8:13pm » |
Quote Modify
|
on Mar 25th, 2008, 5:58pm, lightvector wrote:There is another difficulty that has to be considered: the features and inputs you want for the function. |
| I don't expect highly selective search to beat out full-width search, and I don't expect Monte Carlo methods to be useful at all, and I don't expect learned eval to beat out a hand-coded eval. Furthermore, even in the case of learned eval, my hunch is that hand-chosen features as inputs for some neural net to learn on are more likely to succeed than training from scratch and hoping the features emerge. That said, who knows what will turn out to work? We really have no idea what AI can't do, just because it hasn't been done yet. It's kind of fun that state-of-the-art brute force doesn't work very well for Arimaa, which leaves room for speculation about radically different approaches. Admittedly, for someone trying to win the computer championship, the odds are in favor of refining the existing alpha-beta search with new move-ordering or other heuristics. But for someone trying to make a quantum leap and surge past humans in one shot, the field is wide open for dreaming.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: automated reversal detection in game history
« Reply #14 on: Mar 25th, 2008, 8:39pm » |
Quote Modify
|
on Mar 25th, 2008, 5:58pm, lightvector wrote:Ideally, you want to train your eval to become a better and better predictor of eval + search, but for any given set of inputs, there is only so much you can do. How is the eval function supposed to tell the difference between a position where the opponent can force a capture in 3 moves, versus a slightly different position where a crucial piece is shifted just so that another piece is frozen or unfrozen, or blocking the way, or just plain interfering with some tactic, preventing the capture? And, say, do correctly in even 10% of the hundreds of possible cases? |
| Actually, we humans also can't accurately forecast three-move capture in eval, but we have a pretty good sense of which lines need to be searched. In addition to the approaches of trying to get a better eval and trying to get a faster search, there is a third approach: having better pruning. Deciding whether a line is worth searching is not equivalent to deciding whether the move is good. Chess players spend a lot of time examining moves that are worse than average moves, but which demand attention because they are so forcing that if they are good they might be really good. Suppose a bot had a sense that if it sacrificed its camel it would have a 10% chance of forcing goal in three, and it therefore pursued that line more aggressively (as opposed to an alpha-beta searcher which would put the move last in search order due to the low intermediate eval). That vague sense of need to examine the camel sacrifice, although not precise, could be very useful.
|
|
IP Logged |
|
|
|
|