|
||||
Title: automated reversal detection in game history Post by IdahoEv on Mar 24th, 2008, 9:17pm I have a stored database of all boards that have existed in the game (at least through the last time I ran the updater), meaning every step of every turn of every game saved independently as a board description. I am mining this database to work on the eval function of my bot. What would be nice to have (but is of course impossible) is some a priori measure of which player is winning -- and ideally by how much -- that could be used to train an eval function to model that variable. Obviously, however, this can't be done, since that's what an eval function is in the first place: a measure of who is ahead. But what I'm wondering is to what extent one can look at a database of board positions and get such an estimate of who is ahead. If you just randomly assume one player is ahead, you will be right 50% of the time. If you can assume the player who won the game is ahead, you will be right more than 50% of the time. In fact, you would be right 100% of the time if it were not through reversals of fortune - the balance of power shifted mid-game. So is there some way to iteratively find reversals of fortune through an automated process? Consider this sequence: 1) Apply a naive evaluator to all boards. Say, a simple material evaluator. 2) Find points in games where the measure of who is ahead according to that evaluator suddenly changed. (A capture happened!) 3) Assume that means the player who was ahead after the apparent reversal was in fact actually ahead one ply earlier. (they set up the capture through superior play, or the other player blundered) 4) Train some new arbitrary evaluator function (think a complex curve fit/regression) to maximally model this new assumption. 5) Using this new function as the "naive evaluator", repeat the process until the fraction of boards which can be categorized no longer changes between iterations. I see a whole host of problems with that approach, but something about it rings worthwhile to me. Here's another only half-automated approach: 1) Eval all boards with the best eval you've got 2) Identify reversal points 3) Feed these apparent reversal points to humans experts who can look at the board and the previous few plies to determine if they agree with the eval's opinion of the reversal of fortunes, and if so, they can identify on which turn they feel the reversal actually occured. 4) Use the aggregate of the human opinions to categorize these corner cases, and combine that with the results of the naive eval to build a table that categorizes who is winning at all points in history. 5) Use that table as the training set for a more complex eval function. So, I just wanted to see if this triggered any thoughts for people. Can you come up with other ways, theoretically, of determining who was ahead at any particular point in a game? |
||||
Title: Re: automated reversal detection in game history Post by Janzert on Mar 24th, 2008, 9:37pm This seems very similiar, if not equivalent, to the "credit assignment problem" (i.e. how do you decide to parcel out the credit to a sequence of decisions when there is only a consequence given at the end of the sequence). It is something comes up and must be dealt with in many unsupervised learning situations. Janzert |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 24th, 2008, 10:25pm I don't know anything about automated learning, but it seems that you would need at least a move generator to help you. You don't want your evaluation function to be trained to think it is a "reversal" just because the opponent found the best move. A reversal should only be the delta between the best move and the move actually played. |
||||
Title: Re: automated reversal detection in game history Post by nbarriga on Mar 25th, 2008, 12:09am Why don't use something like TD(lambda)? Haizhi and I already failed at doing so, but maybe third time's a charm(I think that's the expression, isn't it?). There's a small introduction in the wikipedia: http://en.wikipedia.org/wiki/Temporal_difference_learning |
||||
Title: Re: automated reversal detection in game history Post by IdahoEv on Mar 25th, 2008, 1:21am on 03/24/08 at 22:25:37, Fritzlein wrote:
I'm not understanding your reply, Fritz, and I think you may be using the word "reversal" different than I am in this instance. To clarify, I am looking statically at boards (moves not relevant) and, assuming there is some "true" measure of who is ahead/winning, doing my best to estimate that measure. "reversal" in my meaning is a state where, for example, on some ply X, Gold is winning, but on ply X+1, Silver is winning. It doesn't matter who moved, what they moved, or what the best move was ... only a static measure of who is ahead. Being able to identify those locations a priori would produce a better training set than "assume in all cases that the eventual winner is currently ahead". That simple assumption is the one I used when empirically training those material eval functions a while back. I think that was fine for eval, but a better way of generating a training set would be preferable when training an actual eval for use in a bot. Especially since the most critical states for the bot's eval to evaluate correctly are those near reversals of fortune. Once one player is indefeatably far ahead (the easy case) it's too late for the bot to make much difference. You may have already completely understood what i was asking, in which case it's just me who didn't quite get your followup. |
||||
Title: Re: automated reversal detection in game history Post by lightvector on Mar 25th, 2008, 9:05am The main problem I see with any automated assignment is how to decide whether a swing in advantage was the result of good play or one side blundering. Say Silver loses a horse between positions A and B. If the loss was forced, then ideally you would want to assign a score of a horse loss to position A, because the horse is destined to be lost at that point (possibly as a result of a much earlier mistake by Silver). But what if Silver simply made a blunder, and the horse could easily have been saved at no cost? Then, you would not want to reduce the value of position A, since the true value of position A does not include a horse loss, except that one player made a silly mistake. The problem I see is that it is difficult to distinguish the two cases. In both cases, you have a horse present in A, and gone in B, but the proper values to propagate to position A could be very different. It seems like you need a bot running alongside actually evaluating everything with a search, and if the search result disagrees too much with the actually result, to flag a blunder and not a forced loss/gain. Of course, this has its own problems, and I doubt bots are strong enough to do this well. |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 25th, 2008, 10:24am Lightvector expressed my concern more clearly than I did, maybe because I am confused in my own mind. Let me try to expound a bit to see for myself if I have it straight. You want an static evaluation that is predictive of the future. In a game where one side slowly builds up an advantage over the course of the game and then finally wins, you would want the static eval to start at equal and gradually get more favorable for the winning player as the game progresses. The eval function isn't very good if it doesn't see the win coming. For example, if the evaluation of every intermediate position was equal until right before the game ended, then the eval was useless, because it didn't perceive the gradually tipping balance. If the eval thought that the losing player was ahead until right before he lost, then the evaluation function is actually worse than useless, since it thinks a weak position is strong. Based on this philosophy, you can train your evaluation function to start at even and gradually approach a win for the side that acutally won. The trouble with this approach is that most games don't follow a straight line from start to finish. There are often dramatic turnarounds due to blunders. It could be that one player steadily built up a lead throughout the game, but goofed just once and allowed an immediate goal. Using that game to train the evaluation function will send the wrong message. Because the player with the stronger position lost, it will appear that it wasn't actually the stronger position. Therefore it seems that the training process must have a blunder filter. If the evaluation function has a drastically different opinion of a certain game position and the game position one turn later, that doesn't necessarily mean the evaluation function stinks for not foreseeing the future. It could mean the player stinks for not seeing the right move. You can fix this by looking ahead a mere one ply. Your static eval can compare the actual move to the move that it thinks is best, and thereby identify and discard blunders. Therefore it won't train itself on the basis of bad moves. Does that make any sense? |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 25th, 2008, 12:06pm OK, now I'm just going to go crazy with naive speculation about automated training. This could be humorous since I have done essentially no reading on the subject. It seems that there is a tradeoff between evaluation and search. With perfect evaluation, only one ply of search is needed: generate all the moves and pick the one with the best eval. On the other extreme, with perfect search, only won/lost eval is needed. Just build out the search tree until one player or the other wins, and bubble up the results by mini-max. We can't do perfect search, so we want to improve our evaluation. To do this we need feedback from a better evaluation. Finding a source of better evaluation is tricky. Humans have better evaluation, but human input is available in very limited quantity and is unreliable. We would rather automate the learning process, and bootstrap a decent evaluation function into a better one. If we can do this the sky is the limit, because the process can be repeated to turn the better evaluation into a still-better evaluation, etc. Search seems like a great way to bootstrap. We assume that eval+search is better than eval alone, and use eval+search to provide feedback for training eval. Naively, it seems that even one ply of search could make this effective, no matter how good eval is by itself. For example, if eval already included a decision tree to find all goals up to four steps, then one-ply-search-plus-eval will train eval to anticipate goals up to eight steps. One-ply-search-plus-new-eval would then train eval to anticipate goals up to twelve steps. And so on recursively. The problem with the otherwise brilliant approach of training eval with eval+search is that if there are any bugs in eval, they will tend to be exaggerated. If eval thinks it is great to lose pieces, then eval+search will train eval to get better at losing pieces, e.g. train eval to dislike situations where the opponent can prevent you from committing suicide, etc. The computer can get caught in its own mind games in positions that are totally irrelevant to playing Arimaa well. A bot can develop superstitions very much like, "I broke a mirror the night before my mother died, so breaking mirrors is bad luck," and then you do anything you can do to avoid breaking mirrors, and since you successfully avoid breaking mirrors, you never get input that contradicts your superstition. There needs to be some sanity check that anchors everything and prevents the training from getting off on an insane tangent. The primary sanity check is, of course, the victory condition. Eval must always recognize wins and losses. This will propagate backwards in training at least as far as search depth. If we use one ply of search then we will at least train eval to see wins one ply further than it could on its own. My prediction is, unfortunately, that anchoring on victory condition only will do nothing more than train the eval to be excessively fond of advanced rabbits, since that is the only thing it can understand. Starting from scratch, your eval will quickly evolve into arimaa_score. This is bad. We can do better than arimaa_score by hand. We are thus tempted to hand-code an eval (definitely one that includes material) and bootstrap from that starting point. But what is our anchor then? We can't anchor the eval to permanently include our assumptions about how much material is worth, because then we will defeat the purpose of the training. Sure, the bot will get better at anticipating material gain/loss ahead of time, but if the piece evaluations are wrong, the bot will just get good at doing the wrong thing. For example it might get trained to excellent precision that a position is favorable if you have a camel capture in two moves that allows your opponent goal in three moves on the other side of the board. On the other hand, neither can we allow the training to modify our material assumptions without allowing it to drift off into superstition. The unanchored training would stand a fair chance of evolving towards something worse than what we hand-coded, or at least adding random noise to our starting point. My completely uninformed hunch is that for automated learning we can't be discouraged and quit if we start from winning conditions and evolve arimaa_score. We have to move forward from there with further automated evolution. We have to let the learner figure out what is important on the next level based on its OWN knowledge of the previous level. We need to bootstrap from loving advanced rabbits, up to loving to block and kill opposing advanced rabbits, up to loving to control enemy traps, etc., up to loving to kill enemy pieces only as a byproduct of the victory conditions at the Nth degree. Unless the evaluation function has evolved to prioritize these things in the correct order, then starting anywhere else and evolving might deteriorate into having to do all that hard work anyway. It might be too much to expect an evolutionary process to improve an eval that wasn't created by that same evolutionary process. I have totally dodged the question of how to generate the training positions to be used for evolution. But on this score we might take a page from great teachers of chess. Good chess instructors hate to waste time on opening study. They want to teach the endgame first, and work backwards from there to the middle game, and take up openings last. My hunch is that learning bots should similarly train first on endgame positions. This means not training on the entire database of human games ever. That will only confuse them with positions they can't understand. If you must use human games, use only positions from the last few ply in every game. Under that restriction on using human games, you will probably need machine-generated positions as well in order to get enough training positions. Self-play of one-ply bots (or self-play between mutated versions) would be a very quick way to get more positions, and even then I think it would be still be essential to only use the positions almost at the end of those games as training positions as well. Train for a million iterations using only the position one ply before goal. When the eval is nearly perfect at spotting one-move goal, start training with positions one and two moves from goal. When that stabilizes, take the training back a third ply, etc. The anti-superstition feature is that your training positions will all be providing feedback near enough to the absolute anchor of win/loss to prevent the eval function's mind from wandering. That was fun to write, but surely hopelessly naive. I look forward to being told what book I need to read before posting again. :P |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 25th, 2008, 1:16pm on 03/25/08 at 00:09:09, nbarriga wrote:
OK, I read a few paragraphs of Tesauro. I guess this clarifies for me what was confusing about my initial post in this thread. IdahoEv wants to train an evaluation function using human moves as a link from one position to the next. I don't trust this link between positions, and would rather replace it with exhaustive search. If I used human games for anything, it would be a source of training positions, not a way to propagate feedback between positions. If I understand Tesauro correctly, I'm proposing using something radically different from TD(lambda), because in TD(lambda) perfect knowledge (i.e. the actual game result) is propagated backwards across imperfect barriers (i.e. the moves actually played). Because each move in an actual game changes who is winning or at least changes the amount by which they are winning, the quality of the signal rapidly deteriorates as you go back a few moves in the game. In my blathering I instead propose propagating imperfect knowledge (minimax of eval on search tree) across a perfect barrier (exhaustive search to some depth). To the depth that you can search, there is no error. All the error comes from doing eval on the nodes. Thus my concern of wanting to train only on positions where search+eval is likely to be pretty darn good. Initially this will be only positions almost at game end; no matter how stupid the eval is, if it at least recognizes the game-ending condition then search+eval is perfect. One hopes that eventually as the eval gets better there will be more and more positions on which search+eval is much better than eval, and therefore useful for training. But unlike TD(lambda), the midgame training isn't obscured by all the moves between game result and midgame position. A possible measure of whether a set of training positions is worth using: How often does search+eval radically disagree with eval? Lots of disagreement is a good sign that training can help. |
||||
Title: Re: automated reversal detection in game history Post by jdb on Mar 25th, 2008, 1:20pm on 03/25/08 at 12:06:54, Fritzlein wrote:
http://www.cs.ualberta.ca/~sutton/book/the-book.html :) |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 25th, 2008, 2:24pm on 03/25/08 at 13:20:57, jdb wrote:
Available in HTML, even. Very cool! |
||||
Title: Re: automated reversal detection in game history Post by lightvector on Mar 25th, 2008, 5:58pm There is another difficulty that has to be considered: the features and inputs you want for the function. For instance, if you only feed your learning evaluation function the fixed inputs of a raw board position, a raw material count, and a game-over flag, even with an arbitrary amount of training, it is quite unlikely that you will produce a particularly strong evaluation function. Depending on how you code it, it will probably be incapable of computing anything too much more complex than a linear combination of its inputs, which doesn't allow for much. Automated learning can be wonderful for optimizing an evaluation function to make the most accurate predictions off of the inputs you already have, but it doesn’t solve the problem of picking the inputs in the first place. And automated feature detection is still quite young and limited despite being a major research topic. It is highly unlikely that a bot working on the inputs to a typical eval function could teach itself to statically detect 4-step goals, without a LOT of extra hand-picked features help it detect different types of situations and configurations of pieces relevant to goaling. You probably either need to hardcode it (the current solution), or build a very well-designed pattern matcher that can perform automated pattern extraction, or else you need an absurdly generalized learning framework that will probably never converge with practical amounts of training. Ideally, you want to train your eval to become a better and better predictor of eval + search, but for any given set of inputs, there is only so much you can do. How is the eval function supposed to tell the difference between a position where the opponent can force a capture in 3 moves, versus a slightly different position where a crucial piece is shifted just so that another piece is frozen or unfrozen, or blocking the way, or just plain interfering with some tactic, preventing the capture? And, say, do correctly in even 10% of the hundreds of possible cases? Of course, you can add more inputs, or add support for pattern matching, or build your learning bot in a very general framework, but as you add more, it will get more complex, take ever longer to train properly, and run slower and slower. Mainly for this reason, I tend to be skeptical of dreams of programs that could start from nothing more than win/loss detection, and become strong. For Arimaa in particular, there's so much nonlinearity involved in how the precise location of different pieces can affect tactics and even global strategy. One piece out of place, and that elephant is no longer blockaded. Shift that trap defender one square over, and suddenly a big capture is possible. Add or remove a single rabbit, or shift a single strong piece one step in any direction during a goal race, and suddenly the outcome of the game swings wildly. I presume this is why games such as Chess and Go, have resisted efforts at automated learning reasonably well, in contrast to games like Backgammon, which are relatively more linear and have seen successful learning programs. Arimaa seems to fall into the former category. |
||||
Title: Re: automated reversal detection in game history Post by nbarriga on Mar 25th, 2008, 7:32pm on 03/25/08 at 17:58:23, lightvector wrote:
Actually I think there have been some moderate success in attempts at using neural networks as an eval function(ex. see neuroGo). I have high confidence that it can work quite well for arimaa. |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 25th, 2008, 8:13pm on 03/25/08 at 17:58:23, lightvector wrote:
I don't expect highly selective search to beat out full-width search, and I don't expect Monte Carlo methods to be useful at all, and I don't expect learned eval to beat out a hand-coded eval. Furthermore, even in the case of learned eval, my hunch is that hand-chosen features as inputs for some neural net to learn on are more likely to succeed than training from scratch and hoping the features emerge. That said, who knows what will turn out to work? We really have no idea what AI can't do, just because it hasn't been done yet. It's kind of fun that state-of-the-art brute force doesn't work very well for Arimaa, which leaves room for speculation about radically different approaches. Admittedly, for someone trying to win the computer championship, the odds are in favor of refining the existing alpha-beta search with new move-ordering or other heuristics. But for someone trying to make a quantum leap and surge past humans in one shot, the field is wide open for dreaming. |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 25th, 2008, 8:39pm on 03/25/08 at 17:58:23, lightvector wrote:
Actually, we humans also can't accurately forecast three-move capture in eval, but we have a pretty good sense of which lines need to be searched. In addition to the approaches of trying to get a better eval and trying to get a faster search, there is a third approach: having better pruning. Deciding whether a line is worth searching is not equivalent to deciding whether the move is good. Chess players spend a lot of time examining moves that are worse than average moves, but which demand attention because they are so forcing that if they are good they might be really good. Suppose a bot had a sense that if it sacrificed its camel it would have a 10% chance of forcing goal in three, and it therefore pursued that line more aggressively (as opposed to an alpha-beta searcher which would put the move last in search order due to the low intermediate eval). That vague sense of need to examine the camel sacrifice, although not precise, could be very useful. |
||||
Title: Re: automated reversal detection in game history Post by clauchau on Mar 26th, 2008, 8:28am on 03/25/08 at 12:06:54, Fritzlein wrote:
I don't think so and I've been planning to find about it for some times now but didn't quite find the time yet. Here is how I think keeping rabbits on the back row may naturally emerge (provided we specifically look at that feature): 1) 1-ply search tons of uniformly-random positions with Gold moving, only knowing the value for gameovers positions, having it 0 otherwise, assuming some Gold's and Silver's materials and their rabbits advancement. Look at the average value. There are too many combinations of the material and advancement features but guess an approximate formula to cover them all based on your sampled observations. At that stage I agree it should favor Gold's and Silver's advanced rabbits. But you now need to reverse your formula so that it actually evaluates positions when Silver is to move - which is the way you have already been evaluating the leaves in the search above. 2) Repeat this with the corrected evaluation function, i.e. positions reached which are not gameover don't get the value 0 any longer, they get the reversed approximate formula guessed on the previous stage. As a result, Gold is going to be in favor of advanced rabbits only on positions where it finds a winning move. On every other positions, non-advanced rabbits are going to be favored because it doesn't want Silver to have a good position! |
||||
Title: Re: automated reversal detection in game history Post by IdahoEv on Mar 26th, 2008, 9:47pm on 03/25/08 at 09:05:56, lightvector wrote:
Ahh, but see here you are letting the perfect be the enemy of the good enough. Let's suppose that the loss of a horse on ply X switched the lead from player w to player b, and b won the game. Until that point, w had been ahead. Then your quandary prevents us from knowing with full accuracy whether we should assign b the advantage from X to End or from X-1 to End. And you're right - we probably can't do that perfectly. But if we can do it at all we can produce a better training set than the alternative, which is to assign b as the winner for all positions in that game from ply 1 to the end - which is clearly wrong in this hypothetical case for all positions from 1 to X-2, and maybe wrong for X-1. The possibility of misclassifying ply X-1 shouldn't prevent us from making the attempt that might allow us to correctly classify all positions 1 through X-2. As for whether or not training is worthwhile (which is much of what this discussion has been about so far), I have strong reason to believe it is and will be trying it whether or not Fritz approves. :-) So to that end, I'm simply trying to generate the best training set I can. I'm leery of approaches that involve the eval itself in generating the training set, even though I suggested that possibility in the initial post, precisely for some of the reasons given in this discussion. |
||||
Title: Re: automated reversal detection in game history Post by IdahoEv on Mar 26th, 2008, 9:53pm on 03/25/08 at 19:32:06, nbarriga wrote:
Likewise, a neural network eval with fairly naive inputs achieved a very high rating in checkers only a few years ago. And I think the structure of Arimaa actually lends itself much better to an NN approach than does that of Go or checkers. (Whereas Chess looks almost like it was designed purposefully to foil neural networks). That's not what I'm working on, though, at least not this year. |
||||
Title: Re: automated reversal detection in game history Post by Fritzlein on Mar 27th, 2008, 10:26am on 03/26/08 at 21:47:28, IdahoEv wrote:
I thought at first you were just proposing to use sudden changes in eval to train eval. My intuition was that if you are using eval to train eval, then using search to link one position to the next is better than using the move the human actually played to link one position to the next. The perfect is not the enemy of the good in this substitution: you can have the perfect (search) in every instance where you have the good (a human-chosen move). It doesn't take away any of your ability to train to use search+eval rather than human_move+eval as the feedback. But now I see that something else is going on. You want to use the information that Silver actually won the game many moves after the abrupt change in your eval. You want to use actual game results in order to train eval, and the reason for detecting big swings in eval is break the chain that goes from game end to game beginning. You don't want to assign credit for the final result to a position twenty moves earlier if there was a big blunder ten moves from the end. Do I understand your undertaking better now? If so, then we don't have such a big difference of opinion: philosophically I approve of cutting off the distance backwards a game result should be used as feedback. Quote:
Don't you realize that since I am the World Champion of Arimaa that makes me the expert on computer science and artificial intelligence, not to mention microeconomics, nuclear physics, and fashion design? Trust me on this... ;D But seriously, I'm thrilled that you are trying an alternative to hand-coded evaluation functions, and I hope you succeed. It may have come across that I was discouraging automated learning, but I was really trying to put an alternative automated learning method on the table. Clearly, the odds are that my idea is wack. |
||||
Title: Re: automated reversal detection in game history Post by IdahoEv on Mar 28th, 2008, 1:59am on 03/27/08 at 10:26:06, Fritzlein wrote:
Bingo. Quote:
And you can see how this leads to the deepest problem: how to train an eval that can correctly score positions in the opening. The further back from the end we are, the less likely it is that knowledge of the eventual winner is a useful input to a learning function. "The eventual winner is ahead" is probably correct 99% of the time 3 turns before the end, but probably only correct 60% of the time 3 turns after the beginning. But bots has gotta be able to play in the opening, too. So what i'd like to do is somehow increase the quality of the training set by correctly classifying earlier states ... which is equivalent to detecting reversals. The further problem, of course, is that classifying those states is exactly what an eval is supposed to do. So I need an eval to train an eval. Or I need human expertise. Or some combination. |
||||
Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1! YaBB © 2000-2003. All Rights Reserved. |