|
||||
Title: Interesting MCTS variant, MCTS-Solver Post by Nazgand on Feb 20th, 2012, 6:45pm http://www.ru.is/faculty/yngvi/pdf/WinandsB10a.pdf I find this very interesting and would be interested in seeing a bot similar to this for Arimaa. As I discovered last time I tried programming a bot, I am too lazy/have too much homework, so I pass the work on to others. :P |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Hippo on Feb 20th, 2012, 7:02pm I surely should read it ;) thanks for the link. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 21st, 2012, 8:44am Thanks for the link. These guys are heavyweights, which would make it mandatory reading even if the abstract weren't in itself so compelling. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 21st, 2012, 10:24am Some initial impressions: (1) Omar has long propounded the theory that new methods should not be rejected merely because they perform worse than established methods at first. New methods might have greater potential in the long run, but not show it immediately because they need to be enhanced and fine-tuned. This paper strongly supports Omar's theory. MC-LOA (monte-carlo lines of action) was initially much weaker than established (alpha-beta) champion MIA, but MIA was a very mature and polished bot, whereas MC-LOA was a diamond in the rough. As enhancements and hueristics were added to MC-LOA it gradually caught up to MIA, eventually passed it, and MC-LOA shows every promise of being amenable to yet further improvement, although the authors paused to publish their exciting result before pushing their new bot as far as it could go. (2) There is a fundamental division in my mind between monte carlo bots that use playouts up to game termination for evaluation, and ones that don't. The work on Arimaa culminating in bot_akimot (and the corresponding thesis) were interesting, but to me seemed intuitively unpromising because akimot didn't use random playouts to natural game termination. Instead the evaluation came from limited-depth playouts and a traditional Arimaa evaluator. If evaluation is going to be done in bot_akimot style, why would monte carlo formulas for exploration/exploitation handle things any better than, for example, late-move reductions, which are alpha-beta's version of exploration/exploitation? (3) In the case of Go, random playouts give a reasonably fair evaluation of a position. Indeed, there was an important paper in which it was shown that semi-random playouts biased by selecting stronger moves can give a worse evaluation than random playouts. But for LOA, the authors apparently got strong position evaluation from biased semi-random playouts, much better than purely random playouts. I am not sure how this applies to Arimaa, where random playouts give a terrible evaluation. Can biased random playouts in the style of MC-LOA possibly be adequate for Arimaa? Certainly the notion of a bias formula based on simple positional features and expert games is intriguing. Perhaps the feature set that lightvector used for sharp's root pruning could be used without modification. (4) The important role of the tactical "solver" is non-intuitive to me. It seemed essential to MC-LOA's success, but antithetical to the Monte-Carlo approach. Maybe I am misinterpreting here, but the success of the fusion approach suggests that if there is a forced win, then the efficiency of alpha-beta is mandatory, but if there is no forced win, the position is mushy-strategic in a way that monte-carlo is more efficient than alpha-beta. I would not have expected such a dichotomy. On the contrary, I would have thought that even in the absence of a forced win, LOA might be highly tactical, with abrupt changes of score between the correct move and the almost-correct move, and abrupt changes in evaluation between lower search depth and higher search depth. I say I wouldn't have expected this dichotomy, but a sharp cliff between forced win and strategic mush seems reasonable when I think about it in light of my Arimaa experience. It seems to me that even Arimaa endgames are amenable to long-term planning and subtle tradeoffs when the goal threats are not immediate, and it is only when goals creep within the search horizon that tactics dominate strategy. I can imagine that being able to heavily prune with a solver might leave behind a situation, within the narrowed game tree, in which fuzzy reasoning outperforms exhaustive search. (5) How can we get these guys interested in Arimaa? Well, those are just my first, unorganized thoughts, so I may be totally missing the main points. The article is in any case essential reading for anyone interested in strategy game AI. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Hippo on Feb 21st, 2012, 12:35pm Don't invite them, they are stepping in my territory :) ... yes I like the paper, I had simillar ideas in mind and their paper helped me to crystalize these ideas. I surely want to go this way. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by rbarreira on Feb 21st, 2012, 1:37pm on 02/21/12 at 10:24:09, Fritzlein wrote:
Nice observation. I haven't read the whole paper but I guess it's possible that the forced win scores are giving a big strength boost by themselves, while other abrupt score changes not being returned is not a big problem, despite it being a potential improvement. In other words, I think the dichotomy you talk about does not necessarily exist, they're just treating the two situations (forced wins / other tactical scores) differently despite them being similar. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Hippo on Feb 21st, 2012, 2:03pm There are other tweeks in MCTS (especially go) not mentioned in the paper. ... "history heuristics" RAVE and dynamic ko. Surely they are especially important for go, and probably with no value for LOA. I have an impression the step based RAVE could have some value for Arimaa (for strategical goals rather than tactical), and I do think dynamic ko is very important for a bit "onesided" positions. Without it ... sacrifying a piece to make the MCTS bot blind would be a viable strategy. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by clyring on Feb 21st, 2012, 5:25pm ^Dynamic komi is only applicable when your final score can be evaluated in terms of some currency or point value as in go or if one cuts short a playout before its natural end and uses an evalutation function. As for the main subject of this thread, my main thought on the matter is this: If there is a terminal node within your search tree, you are doing your bot a disservice not to make your bot use this information and avoid spending playouts on lines that have already been solved. It does not at all go the way opposite that of MCTS, but instead furthers it by considering a terminal node to already, in effect, have an infinite number of playouts either in favor of or against it. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Hippo on Feb 21st, 2012, 6:44pm on 02/21/12 at 17:25:04, clyring wrote:
I don't think so. Having a camel advantage (and score cutoff) most lines look promissing so losing a Dog does not look like a problem ... with dynamic komi this would not be the case. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by clyring on Feb 21st, 2012, 7:04pm If you complete all playout games, then indeed this is the case that it will be relatively indifferent to changes in the amount of advantage when one side already has a significant advantage. (...although losing material for no good reason won't make the bot any easier to beat, as it still plays just as well if you get it back to even.) However, dynamic komi does not change this at all because you cannot really say you have won by a certain 'margin' in arimaa as in go, where a win by 20 points can be considered better than a win by 2 points. A rough equivalent of dynamic komi would require something more along the lines of cutting your playouts short after some number of moves or when some other condition is met and then applying an evaluation function to the result and considering playouts in which the advantage becomes less to be losses instead of still being likely wins. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 21st, 2012, 8:27pm on 02/21/12 at 17:25:04, clyring wrote:
Here you are speaking of using information in the search tree more effectively by changing the move selection and back-propagation methods. That seems to me to be, as you say, merely like doing Monte Carlo more effectively. What is antithetical to Monte Carlo is doing a full-width check for immediate wins before doing any playouts. As they explain in the paper: Quote:
This is not more effective use of information that MCTS already had available; instead it is a concession that looking at every move is more valuable than selecting one to examine at random. If that improves playing strength, maybe it would improve playing strength more to do a 3-ply check for forced wins before doing any playouts, etc. Perhaps eventually alpha-beta searchers will have added so many ways to imbalance the tree and MCTS will have added so many ways to search full-width that it won't be clear what the difference between them is. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by clyring on Feb 21st, 2012, 9:36pm Checking for immediate wins also does not seem antithetical to me. Using heuristics (including the heuristic of taking an immediately winning move if one is available) both in the tree and in playouts can save time and yield better results. The same is also true of alphabeta. To me, this is much the same idea as using goal detection in arimaa to find and prune losing moves earlier. That said, such heuristics must be used with care- There still need to be enough moves that pass the heuristics that the playouts aren't effectively playing out the same game repeatedly- there must be fairly significant variation among the moves left in order for the playouts to be much more useful than a static evaluator. Also remember that nearly every rule of thumb has a long list of situations in which it can be misused- and believe me, a bot can and will find them all... ;) |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 21st, 2012, 9:48pm on 02/21/12 at 21:36:35, clyring wrote:
Well, if saving time and yielding better results is the measure of whether or not something is antithetical to MCTS, then maybe alpha-beta search is not antithetical to MCTS. After all, alpha-beta search saves time and yields better results. :D Quote:
Yes, indeed. You make my case well. As I alluded to above, alpha-beta search seems to be evolving to search trees that are more and more imbalanced. In some sense, this is antithetical to the notion of full-width search, and rather like the MCTS notion that better moves should be "exploited" more while still "exploring" worser moves to a limited extent. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by clyring on Feb 21st, 2012, 10:42pm Since I left this out from my previous post... on 02/21/12 at 20:27:59, Fritzlein wrote:
I seem to recall reading somewhere that some havannah bots do this (3-ply search) in-tree. This is simply a matter of cost vs benefit- Searching for forced wins up to 13 plies will obviously yield better results in general than to 11 plies for the same number of playouts. One problem with this is that searching deeply can take a long time. The other problem is that not doing this at all in a tactically-oriented game where these issues occur reasonably often can detract significantly from the quality of play as well as greatly increase the length of each playout. The balancing-point between these tradeoffs depends much on the nature of the game and on how quickly such a check can be performed. In the case of havannah, the check for whether or not a given move leads to a 3-ply forced win is very fast until late in the game and random moves without this heuristic can often leave game-winning formations in place during playouts for dozens of moves without doing anything about them. Perhaps I should write what I consider the main idea behind MCTS before I start arguing about its antithesis, then... MCTS is, to me, about one very simple thing: dealing with uncertainty when you do not (currently) have an accurate evaluation. A very cheap, easily repeated, and very rough evaluation such as running a pseudorandom playout from a position is a perfect example of getting such an evaluation and is exactly the sort of thing that MCTS handles best. A cheap heuristic reduction of this uncertainty, such as scoring a trivial forced win as +1 or encouraging examination of something very likely to be a good move, such as a rabbit advance on an empty wing in arimaa, fits very well with my ideas about MCTS. Reductions in alpha-beta could indeed be said to stray away from the concept of a full-width search, but to say that performing fast, shallow full-width searches with minimal evaluation within MCTS is opposed with its central philosophy seems wrong to me. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 22nd, 2012, 11:14am on 02/21/12 at 22:42:21, clyring wrote:
If the central philosophy of MCTS is to get a good evaluation of a position by any means, of which pseudo-random playouts are only one example, then nothing that works to improve the evaluation is contrary to the approach. In particular, if we can get a better evaluation of a position with no playouts of any sort and no randomness of any sort, but rather a fully-deterministic alpha-beta search, or a quiescence search, or a straight-up hand-tuned evaluation formula with no search, you would say it is in keeping with MCTS. If that's the way you want to define it, you are being logically consistent, but I don't think you are using MCTS to mean what other people use it to mean. I am a total pragmatist. I think an approach to creating game-playing AI is a good one if it works. I'm not judging any bot by the conceptual purity of its algorithm. I'm only interested in understanding what works and why it works. Because some guys came up with a new approach to an LOA bot that was good enough to beat the old champion, I am curious what they did differently, and why it works. If I'm making a fuss over terminology, it is because it doesn't clarify things very much to say their new bot used MCTS. Different people use the same term in different ways. MC-LOA apparently does some things that other MCTS bots don't do, and those changes are essential. This is not to say that MC-LOA isn't "really" an MCTS bot, or that they shouldn't have made those changes. What is important is that we separate the essential from the optional and the effective from the ineffective. Defining MCTS as "doing whatever is effective to evaluate a position" renders the very term MCTS useless in distinguishing what works from what doesn't work. It holds back the discussion rather than advancing it. All bots, whether alpha-beta or MCTS, use some combination of search and evaluation. All of them use search to improve imperfect evaluation. That is to say, they all expand the tree to include the children of what was formerly a terminal node, and then back-propagate the evaluation of the children to improve the former evaluation of the parent position. All the algorithms have tradeoffs between having a slower/better evaluation which reduces the amount of search that can be done, or having faster/worse evaluation which increases the amount of search that can be done. Another thing that seems to be generally accepted both in alpha-beta bots and MCTS bots is that it pays to search some lines more deeply than others. Some heuristic determines how much more to expand the tree under more promising moves, based on how much more promising they are. With all the commonalities in philosophy, what are the essential differences? (1) Usually (but not always) MCTS evaluates a position by playing it out, whereas alpha-beta uses static evaluation. (2) Usually (but not always) MCTS propagates values from children to parents via a weighted average, whereas alpha-beta uses minimax. (3) Usually (but not always) MCTS is willing to add nodes to the tree below a child that doesn't have all its siblings, i.e. a position need not be fully expanded before beginning to expand its children, whereas alpha-beta is full-width. In all three of these points, the MC-LOA "solver" is doing something typical of alpha-beta programs, and is doing it to good effect. The fact that it worked is a caution that ideological purity is a handicap; i.e. the pragmatists who are willing to do whatever works are going to win. None of the three dichotomies is an absolute. Perhaps the best bots will be hybrids on all three points. Changing gears, I think that something important may have slid under the radar of the authors. That is to say they did something extremely clever but didn't sufficiently emphasize its importance. In alpha-beta searchers, late-move reductions are made on the basis of evaluation only. The decision to examine a move more/less depends entirely on whether it is a good/bad move. For MC-LOA, however, the decision whether to examine a move more or less depends on current evaluation plus "progressive bias". The progressive bias is described as how likely an expert is to play "that type of move". If you think about it carelessly, you could imagine the progressive bias as a mini-evaluation, added to the other evaluation to make it more accurate. In reality, however, progressive bias might be a better measure of how important it is to examine a move. There might be moves that are superficially like expert moves but absolutely terrible in a given situation. Nevertheless, even if they are terrible, they must be examined. In other words, the expansion of the search tree should be biased both in favor of "good" moves and in favor of "important" moves that might or might not be good. We know from lightvector's senior thesis that he did something very similar at the root of sharp. He trained sharp to identify Arimaa moves most like the moves played by experts, and search those more deeply than the others. But he, too, seemed to conceive of it as an alternate evaluation rather than understanding it as fundamentally a way to find moves that are important rather than moves that are good. Reading about MC-LOA has reinforced my strong suspicion that there are gains to be made, either in alpha-beta search or MCTS, via increasing the emphasis on searching moves that are important but not necessarily good. For the authors of MC-LOA I propose a trivial experiment: increase initial the importance of your progressive bias relative to move value (in the search tree portion, not in playouts) and see whether the playing strength increases. I will bet in the dark that this makes MC-LOA stronger. (Yes, I recognize it is outrageous to even suppose the authors haven't already tried this, never mind to predict the outcome.) |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 22nd, 2012, 1:20pm on 02/22/12 at 11:14:48, Fritzlein wrote:
Oh, whoops, the authors did try this, and they reported it. Increasing the importance of the progressive bias relative to the evaluation of a node within the search tree was one of the differences between MC-LOA and MC-LOA-T, and it did make it stronger. I'm a day late and a dollar short, as usual. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by tize on Feb 22nd, 2012, 1:20pm on 02/22/12 at 11:14:48, Fritzlein wrote:
This is also true for alphaBeta to some extent, since forward prunings and extensions, like quiescence, makes the search skip a lot of the continuations. One difference that I can think of is that a move that is bad will get lower and lower process time compared with the good move the more the MCTS is running and building its knowledge about the position. But in alphaBeta a bad move will more or less close in on a factor times the time on the best move. This will still be true for moves falling for a null move pruning as that search depth will grow at the same speed as the nominal search depth. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Hippo on Feb 22nd, 2012, 5:14pm I like your summary Fritzlein :). Yes MCTS is paradigma allowing a lot of different interpretations. It got initialy name from the random playouts, but it's selection method for driving exploration/exploitation seems gained even higher importance. Even in go the random playouts with equal move probabilities does not lead to the best results and doing playouts with "trained" move probabilities helps. In highly tactical games the "training experts" move probabilities is much more important. It was discussed in Kozelek's thesis that random attack is much more effective than random defense in Arimaa ... and a general assumption is the random playouts will find correct continuation in highly tactical positions very slowly. I am sure this is case for equal move probabilities, but I hope "training experts" probabilities will speed up the convergence sufficiently. There are 2 essential steps in the mentioned paper. One is cutoff by evaluation function approach. And the other is speeding propagation of proved results. I do think the second idea could be more developed to change the backpropagation method. Ususally MCTS weights all playouts by the same weight. (Parallel MCTS often adds 1 to the number of simulations at first and backpropagates the win later ... to improve chance to navigate other threads to other branches). But seems to me the longer the playout is the less acurate it is. What about to weight shorter playouts more (those which are terminated by evaluation treshold earlier). That would be compatible with draw evaluation for too long playouts. (Just stop playout when it's weight is too close to zero). Actually I am trying to make the bot to compute more like I think in postals. And I actually do playouts chosing from small amount of interesting moves and play several moves further to gain some "impression". But the longer the playout is the less important the final impression is for me. I am sure I think much closer to MCTS than to alpha-beta. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 23rd, 2012, 12:09pm on 02/22/12 at 13:20:50, tize wrote:
I don't know the proof behind the correctness of UCT, but my intuitive understanding is that UCT is "correct" in that the probability of exploiting the best move converges to one. To satisfy that property, yes indeed, there can't be a fixed ratio of CPU time spent between better and worse moves. The time spent on inferior moves must go to zero for the exploitation of the best move to increase to one. But is that really better than a fixed ratio? If I am understanding UCT properly, I don't see why the property of being "correct" should correlate to being best in terms of game AI. Intuitively, I don't want my bot thinking more and more about the best move, to the exclusion of thinking about all other moves. On the contrary, assuming there is a fixed amount of variation in evaluation per extra ply of search depth, then the exploration/exploitation tradeoff that I want is precisely a fixed ratio of CPU share per unit of difference in current evaluation, i.e. CPU share being logarithmic in evaluation. And since search depth is proportional to the logarithm of CPU time, this is equivalent to making search depth linearly dependent on current evaluation. Surely this is possible within the exploration/exploitation paradigm, possibly by changing only one term in one place in the code. Intuitively (again, I'm guessing in the dark) this will make a stronger bot than a exploration/exploitation tradeoff that converges to spending all its CPU examining the best move, stronger because evaluation errors that can be corrected by greater search depth are corrected sooner. At an abstract level, we are not playing slots on a many-armed bandit, where we lose money every time we play the wrong machine. At the end of the day, we want our probability of determining the best move to go to one, which is a weaker demand than making the probability of searching the best move go to one. Of course the latter implies the former, but we can have the former without the latter, for example via alpha-beta search. It may be that the demand of "correctness" of UCT ultimately handicaps its efficiency. Thus I personally would relax the demands of UCT, and permit asymptotically constant (non-zero) search share of weaker moves, with one example being the fixed-ratio method you refer to. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 23rd, 2012, 12:52pm on 02/22/12 at 17:14:38, Hippo wrote:
I believe one aspect of expert human thinking that is under-represented in both alpha-beta bots and MCTS bots at present is that humans look at high-risk, high-reward moves with greater preference. If there is a camel sacrifice that might get me a forced goal, I spend time trying to precisely work out whether the forced win is real or illusory. An alpha-beta bot fails by quickly seeing the camel loss and throwing the move to the bottom of the heap until the forced goal comes within its search depth, when it suddenly the goal is proved and it bounces to the top. This takes too long to see, and would be seen earlier with greater emphasis on high-risk, high-reward moves. Meanwhile an MCTS bot sees the camel sacrifice as generating some wins and some losses depending on the random playouts, so it doesn't throw the move on the rubbish heap (yay!), but proving the forced goal or lack thereof is something that MCTS is inefficient at (boo!), so for a long time it returns mediocre mush instead of a proper evaluation which is in truth either very positive or very negative . Both kinds of bots need to bias their searches more towards moves with high variance, i.e. more towards moves that could turn out to be great or could turn out to be terrible. I'm not sure how to implement this, but I intuit that sharp's attempt to imitate expert moves based on the presence/absence of features was a huge step in the right direction. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Nazgand on Feb 23rd, 2012, 6:12pm on 02/23/12 at 12:52:50, Fritzlein wrote:
If I remember correctly, Houdini does something like this. My memory is fuzzy on this though. |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by tize on Feb 24th, 2012, 1:28pm on 02/23/12 at 12:09:58, Fritzlein wrote:
I agree, but I still would like the bots time spent on bad moves to approach zero instead of a fixed ratio. Maybe the moves that should be investigated the most is the second best moves, that is, just reduce the probability a lot for the best looking move. It mustn't be ignored completely though as it could have gotten a superficial high value. This idea fits your refrasing of the question to solve. Quote:
|
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by Fritzlein on Feb 25th, 2012, 9:11am on 02/21/12 at 20:27:59, Fritzlein wrote:
Update: The authors experimented and discovered that a 3-ply search within playouts was too heavy and decreased strength, but a selective 2-ply search produced a substantial strength gain. http://www.personeel.unimaas.nl/m-winands/documents/Cig2011paper24.pdf It seems like a step toward the convergence of Monte-Carlo and alpha-beta philosophies. More experiments like this in various games should bring us closer to understanding what works and why it works. For example, are playouts really essential to MC-LOA-alpha-beta, or is the real magic the exploitation/exploration tradeoff, i.e. the manner in which the search tree is unbalanced? |
||||
Title: Re: Interesting MCTS variant, MCTS-Solver Post by ddyer on Feb 27th, 2012, 1:12pm We know alpha-beta wastes most of it's time methodically investigating obviously worthless lines of play. MCTS is a different way of managing the overall search. In some cases MCTS can be combined with absurdly simple evaluation strategies with stellar results. A good case in point is "Cookie Disco" I recently added at Boardspace.net, where a pure random playout strategy is very strong. In other cases, additional tricks and game-specific knowlege have to be applied to get good results; this is the case for Arimaa. I experimented with MCTS for Arimaa at Boardspace, but the results were very disappointing. The paper presents a grab-bag of methods which, tuned for the game and combined, get to a very good result, which gives me hope that sufficiently clever programming may yet produce good results for Arimaa. |
||||
Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1! YaBB © 2000-2003. All Rights Reserved. |