Author |
Topic: Interesting MCTS variant, MCTS-Solver (Read 6574 times) |
|
Nazgand
Forum Guru
Arimaa player #6461
Gender:
Posts: 87
|
|
Interesting MCTS variant, MCTS-Solver
« on: Feb 20th, 2012, 6:45pm » |
Quote Modify
|
http://www.ru.is/faculty/yngvi/pdf/WinandsB10a.pdf I find this very interesting and would be interested in seeing a bot similar to this for Arimaa. As I discovered last time I tried programming a bot, I am too lazy/have too much homework, so I pass the work on to others.
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #1 on: Feb 20th, 2012, 7:02pm » |
Quote Modify
|
I surely should read it thanks for the link.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #2 on: Feb 21st, 2012, 8:44am » |
Quote Modify
|
Thanks for the link. These guys are heavyweights, which would make it mandatory reading even if the abstract weren't in itself so compelling.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #3 on: Feb 21st, 2012, 10:24am » |
Quote Modify
|
Some initial impressions: (1) Omar has long propounded the theory that new methods should not be rejected merely because they perform worse than established methods at first. New methods might have greater potential in the long run, but not show it immediately because they need to be enhanced and fine-tuned. This paper strongly supports Omar's theory. MC-LOA (monte-carlo lines of action) was initially much weaker than established (alpha-beta) champion MIA, but MIA was a very mature and polished bot, whereas MC-LOA was a diamond in the rough. As enhancements and hueristics were added to MC-LOA it gradually caught up to MIA, eventually passed it, and MC-LOA shows every promise of being amenable to yet further improvement, although the authors paused to publish their exciting result before pushing their new bot as far as it could go. (2) There is a fundamental division in my mind between monte carlo bots that use playouts up to game termination for evaluation, and ones that don't. The work on Arimaa culminating in bot_akimot (and the corresponding thesis) were interesting, but to me seemed intuitively unpromising because akimot didn't use random playouts to natural game termination. Instead the evaluation came from limited-depth playouts and a traditional Arimaa evaluator. If evaluation is going to be done in bot_akimot style, why would monte carlo formulas for exploration/exploitation handle things any better than, for example, late-move reductions, which are alpha-beta's version of exploration/exploitation? (3) In the case of Go, random playouts give a reasonably fair evaluation of a position. Indeed, there was an important paper in which it was shown that semi-random playouts biased by selecting stronger moves can give a worse evaluation than random playouts. But for LOA, the authors apparently got strong position evaluation from biased semi-random playouts, much better than purely random playouts. I am not sure how this applies to Arimaa, where random playouts give a terrible evaluation. Can biased random playouts in the style of MC-LOA possibly be adequate for Arimaa? Certainly the notion of a bias formula based on simple positional features and expert games is intriguing. Perhaps the feature set that lightvector used for sharp's root pruning could be used without modification. (4) The important role of the tactical "solver" is non-intuitive to me. It seemed essential to MC-LOA's success, but antithetical to the Monte-Carlo approach. Maybe I am misinterpreting here, but the success of the fusion approach suggests that if there is a forced win, then the efficiency of alpha-beta is mandatory, but if there is no forced win, the position is mushy-strategic in a way that monte-carlo is more efficient than alpha-beta. I would not have expected such a dichotomy. On the contrary, I would have thought that even in the absence of a forced win, LOA might be highly tactical, with abrupt changes of score between the correct move and the almost-correct move, and abrupt changes in evaluation between lower search depth and higher search depth. I say I wouldn't have expected this dichotomy, but a sharp cliff between forced win and strategic mush seems reasonable when I think about it in light of my Arimaa experience. It seems to me that even Arimaa endgames are amenable to long-term planning and subtle tradeoffs when the goal threats are not immediate, and it is only when goals creep within the search horizon that tactics dominate strategy. I can imagine that being able to heavily prune with a solver might leave behind a situation, within the narrowed game tree, in which fuzzy reasoning outperforms exhaustive search. (5) How can we get these guys interested in Arimaa? Well, those are just my first, unorganized thoughts, so I may be totally missing the main points. The article is in any case essential reading for anyone interested in strategy game AI.
|
« Last Edit: Feb 21st, 2012, 10:25am by Fritzlein » |
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #4 on: Feb 21st, 2012, 12:35pm » |
Quote Modify
|
Don't invite them, they are stepping in my territory ... yes I like the paper, I had simillar ideas in mind and their paper helped me to crystalize these ideas. I surely want to go this way.
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #5 on: Feb 21st, 2012, 1:37pm » |
Quote Modify
|
on Feb 21st, 2012, 10:24am, Fritzlein wrote: (4) The important role of the tactical "solver" is non-intuitive to me. It seemed essential to MC-LOA's success, but antithetical to the Monte-Carlo approach. Maybe I am misinterpreting here, but the success of the fusion approach suggests that if there is a forced win, then the efficiency of alpha-beta is mandatory, but if there is no forced win, the position is mushy-strategic in a way that monte-carlo is more efficient than alpha-beta. I would not have expected such a dichotomy. On the contrary, I would have thought that even in the absence of a forced win, LOA might be highly tactical, with abrupt changes of score between the correct move and the almost-correct move, and abrupt changes in evaluation between lower search depth and higher search depth. |
| Nice observation. I haven't read the whole paper but I guess it's possible that the forced win scores are giving a big strength boost by themselves, while other abrupt score changes not being returned is not a big problem, despite it being a potential improvement. In other words, I think the dichotomy you talk about does not necessarily exist, they're just treating the two situations (forced wins / other tactical scores) differently despite them being similar.
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #6 on: Feb 21st, 2012, 2:03pm » |
Quote Modify
|
There are other tweeks in MCTS (especially go) not mentioned in the paper. ... "history heuristics" RAVE and dynamic ko. Surely they are especially important for go, and probably with no value for LOA. I have an impression the step based RAVE could have some value for Arimaa (for strategical goals rather than tactical), and I do think dynamic ko is very important for a bit "onesided" positions. Without it ... sacrifying a piece to make the MCTS bot blind would be a viable strategy.
|
« Last Edit: Feb 21st, 2012, 2:04pm by Hippo » |
IP Logged |
|
|
|
clyring
Forum Guru
Arimaa player #6218
Gender:
Posts: 362
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #7 on: Feb 21st, 2012, 5:25pm » |
Quote Modify
|
^Dynamic komi is only applicable when your final score can be evaluated in terms of some currency or point value as in go or if one cuts short a playout before its natural end and uses an evalutation function. As for the main subject of this thread, my main thought on the matter is this: If there is a terminal node within your search tree, you are doing your bot a disservice not to make your bot use this information and avoid spending playouts on lines that have already been solved. It does not at all go the way opposite that of MCTS, but instead furthers it by considering a terminal node to already, in effect, have an infinite number of playouts either in favor of or against it.
|
« Last Edit: Feb 21st, 2012, 5:42pm by clyring » |
IP Logged |
I administer the Endless Endgame Event (EEE). Players welcome!
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #8 on: Feb 21st, 2012, 6:44pm » |
Quote Modify
|
on Feb 21st, 2012, 5:25pm, clyring wrote:^Dynamic komi is only applicable when your final score can be evaluated in terms of some currency or point value as in go or if one cuts short a playout before its natural end and uses an evalutation function. |
| I don't think so. Having a camel advantage (and score cutoff) most lines look promissing so losing a Dog does not look like a problem ... with dynamic komi this would not be the case.
|
|
IP Logged |
|
|
|
clyring
Forum Guru
Arimaa player #6218
Gender:
Posts: 362
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #9 on: Feb 21st, 2012, 7:04pm » |
Quote Modify
|
If you complete all playout games, then indeed this is the case that it will be relatively indifferent to changes in the amount of advantage when one side already has a significant advantage. (...although losing material for no good reason won't make the bot any easier to beat, as it still plays just as well if you get it back to even.) However, dynamic komi does not change this at all because you cannot really say you have won by a certain 'margin' in arimaa as in go, where a win by 20 points can be considered better than a win by 2 points. A rough equivalent of dynamic komi would require something more along the lines of cutting your playouts short after some number of moves or when some other condition is met and then applying an evaluation function to the result and considering playouts in which the advantage becomes less to be losses instead of still being likely wins.
|
|
IP Logged |
I administer the Endless Endgame Event (EEE). Players welcome!
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #10 on: Feb 21st, 2012, 8:27pm » |
Quote Modify
|
on Feb 21st, 2012, 5:25pm, clyring wrote:If there is a terminal node within your search tree, you are doing your bot a disservice not to make your bot use this information and avoid spending playouts on lines that have already been solved. It does not at all go the way opposite that of MCTS, but instead furthers it by considering a terminal node to already, in effect, have an infinite number of playouts either in favor of or against it. |
| Here you are speaking of using information in the search tree more effectively by changing the move selection and back-propagation methods. That seems to me to be, as you say, merely like doing Monte Carlo more effectively. What is antithetical to Monte Carlo is doing a full-width check for immediate wins before doing any playouts. As they explain in the paper: Quote:One additional improvement is to perform a 1-ply lookahead at leaf nodes (i.e., where the visit count equals one) [17]. We check whether they lead to a direct win for the player to move. If there is such a move, we can skip the play-out, label the node as a win, and start the backpropagation step. If it were not for such a lookahead, it could take many simulations before a child leading to a mate-in-one is selected and the node proven |
| This is not more effective use of information that MCTS already had available; instead it is a concession that looking at every move is more valuable than selecting one to examine at random. If that improves playing strength, maybe it would improve playing strength more to do a 3-ply check for forced wins before doing any playouts, etc. Perhaps eventually alpha-beta searchers will have added so many ways to imbalance the tree and MCTS will have added so many ways to search full-width that it won't be clear what the difference between them is.
|
|
IP Logged |
|
|
|
clyring
Forum Guru
Arimaa player #6218
Gender:
Posts: 362
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #11 on: Feb 21st, 2012, 9:36pm » |
Quote Modify
|
Checking for immediate wins also does not seem antithetical to me. Using heuristics (including the heuristic of taking an immediately winning move if one is available) both in the tree and in playouts can save time and yield better results. The same is also true of alphabeta. To me, this is much the same idea as using goal detection in arimaa to find and prune losing moves earlier. That said, such heuristics must be used with care- There still need to be enough moves that pass the heuristics that the playouts aren't effectively playing out the same game repeatedly- there must be fairly significant variation among the moves left in order for the playouts to be much more useful than a static evaluator. Also remember that nearly every rule of thumb has a long list of situations in which it can be misused- and believe me, a bot can and will find them all...
|
|
IP Logged |
I administer the Endless Endgame Event (EEE). Players welcome!
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #12 on: Feb 21st, 2012, 9:48pm » |
Quote Modify
|
on Feb 21st, 2012, 9:36pm, clyring wrote:Checking for immediate wins also does not seem antithetical to me. Using heuristics (including the heuristic of taking an immediately winning move if one is available) both in the tree and in playouts can save time and yield better results. |
| Well, if saving time and yielding better results is the measure of whether or not something is antithetical to MCTS, then maybe alpha-beta search is not antithetical to MCTS. After all, alpha-beta search saves time and yields better results. Quote:The same is also true of alphabeta. To me, this is much the same idea as using goal detection in arimaa to find and prune losing moves earlier. |
| Yes, indeed. You make my case well. As I alluded to above, alpha-beta search seems to be evolving to search trees that are more and more imbalanced. In some sense, this is antithetical to the notion of full-width search, and rather like the MCTS notion that better moves should be "exploited" more while still "exploring" worser moves to a limited extent.
|
« Last Edit: Feb 21st, 2012, 9:54pm by Fritzlein » |
IP Logged |
|
|
|
clyring
Forum Guru
Arimaa player #6218
Gender:
Posts: 362
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #13 on: Feb 21st, 2012, 10:42pm » |
Quote Modify
|
Since I left this out from my previous post... on Feb 21st, 2012, 8:27pm, Fritzlein wrote:If that improves playing strength, maybe it would improve playing strength more to do a 3-ply check for forced wins before doing any playouts, etc. |
| I seem to recall reading somewhere that some havannah bots do this (3-ply search) in-tree. This is simply a matter of cost vs benefit- Searching for forced wins up to 13 plies will obviously yield better results in general than to 11 plies for the same number of playouts. One problem with this is that searching deeply can take a long time. The other problem is that not doing this at all in a tactically-oriented game where these issues occur reasonably often can detract significantly from the quality of play as well as greatly increase the length of each playout. The balancing-point between these tradeoffs depends much on the nature of the game and on how quickly such a check can be performed. In the case of havannah, the check for whether or not a given move leads to a 3-ply forced win is very fast until late in the game and random moves without this heuristic can often leave game-winning formations in place during playouts for dozens of moves without doing anything about them. Perhaps I should write what I consider the main idea behind MCTS before I start arguing about its antithesis, then... MCTS is, to me, about one very simple thing: dealing with uncertainty when you do not (currently) have an accurate evaluation. A very cheap, easily repeated, and very rough evaluation such as running a pseudorandom playout from a position is a perfect example of getting such an evaluation and is exactly the sort of thing that MCTS handles best. A cheap heuristic reduction of this uncertainty, such as scoring a trivial forced win as +1 or encouraging examination of something very likely to be a good move, such as a rabbit advance on an empty wing in arimaa, fits very well with my ideas about MCTS. Reductions in alpha-beta could indeed be said to stray away from the concept of a full-width search, but to say that performing fast, shallow full-width searches with minimal evaluation within MCTS is opposed with its central philosophy seems wrong to me.
|
« Last Edit: Feb 21st, 2012, 10:45pm by clyring » |
IP Logged |
I administer the Endless Endgame Event (EEE). Players welcome!
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: Interesting MCTS variant, MCTS-Solver
« Reply #14 on: Feb 22nd, 2012, 11:14am » |
Quote Modify
|
on Feb 21st, 2012, 10:42pm, clyring wrote:Perhaps I should write what I consider the main idea behind MCTS before I start arguing about its antithesis, then... MCTS is, to me, about one very simple thing: dealing with uncertainty when you do not (currently) have an accurate evaluation. A very cheap, easily repeated, and very rough evaluation such as running a pseudorandom playout from a position is a perfect example of getting such an evaluation and is exactly the sort of thing that MCTS handles best. A cheap heuristic reduction of this uncertainty, such as scoring a trivial forced win as +1 or encouraging examination of something very likely to be a good move, such as a rabbit advance on an empty wing in arimaa, fits very well with my ideas about MCTS. Reductions in alpha-beta could indeed be said to stray away from the concept of a full-width search, but to say that performing fast, shallow full-width searches with minimal evaluation within MCTS is opposed with its central philosophy seems wrong to me. |
| If the central philosophy of MCTS is to get a good evaluation of a position by any means, of which pseudo-random playouts are only one example, then nothing that works to improve the evaluation is contrary to the approach. In particular, if we can get a better evaluation of a position with no playouts of any sort and no randomness of any sort, but rather a fully-deterministic alpha-beta search, or a quiescence search, or a straight-up hand-tuned evaluation formula with no search, you would say it is in keeping with MCTS. If that's the way you want to define it, you are being logically consistent, but I don't think you are using MCTS to mean what other people use it to mean. I am a total pragmatist. I think an approach to creating game-playing AI is a good one if it works. I'm not judging any bot by the conceptual purity of its algorithm. I'm only interested in understanding what works and why it works. Because some guys came up with a new approach to an LOA bot that was good enough to beat the old champion, I am curious what they did differently, and why it works. If I'm making a fuss over terminology, it is because it doesn't clarify things very much to say their new bot used MCTS. Different people use the same term in different ways. MC-LOA apparently does some things that other MCTS bots don't do, and those changes are essential. This is not to say that MC-LOA isn't "really" an MCTS bot, or that they shouldn't have made those changes. What is important is that we separate the essential from the optional and the effective from the ineffective. Defining MCTS as "doing whatever is effective to evaluate a position" renders the very term MCTS useless in distinguishing what works from what doesn't work. It holds back the discussion rather than advancing it. All bots, whether alpha-beta or MCTS, use some combination of search and evaluation. All of them use search to improve imperfect evaluation. That is to say, they all expand the tree to include the children of what was formerly a terminal node, and then back-propagate the evaluation of the children to improve the former evaluation of the parent position. All the algorithms have tradeoffs between having a slower/better evaluation which reduces the amount of search that can be done, or having faster/worse evaluation which increases the amount of search that can be done. Another thing that seems to be generally accepted both in alpha-beta bots and MCTS bots is that it pays to search some lines more deeply than others. Some heuristic determines how much more to expand the tree under more promising moves, based on how much more promising they are. With all the commonalities in philosophy, what are the essential differences? (1) Usually (but not always) MCTS evaluates a position by playing it out, whereas alpha-beta uses static evaluation. (2) Usually (but not always) MCTS propagates values from children to parents via a weighted average, whereas alpha-beta uses minimax. (3) Usually (but not always) MCTS is willing to add nodes to the tree below a child that doesn't have all its siblings, i.e. a position need not be fully expanded before beginning to expand its children, whereas alpha-beta is full-width. In all three of these points, the MC-LOA "solver" is doing something typical of alpha-beta programs, and is doing it to good effect. The fact that it worked is a caution that ideological purity is a handicap; i.e. the pragmatists who are willing to do whatever works are going to win. None of the three dichotomies is an absolute. Perhaps the best bots will be hybrids on all three points. Changing gears, I think that something important may have slid under the radar of the authors. That is to say they did something extremely clever but didn't sufficiently emphasize its importance. In alpha-beta searchers, late-move reductions are made on the basis of evaluation only. The decision to examine a move more/less depends entirely on whether it is a good/bad move. For MC-LOA, however, the decision whether to examine a move more or less depends on current evaluation plus "progressive bias". The progressive bias is described as how likely an expert is to play "that type of move". If you think about it carelessly, you could imagine the progressive bias as a mini-evaluation, added to the other evaluation to make it more accurate. In reality, however, progressive bias might be a better measure of how important it is to examine a move. There might be moves that are superficially like expert moves but absolutely terrible in a given situation. Nevertheless, even if they are terrible, they must be examined. In other words, the expansion of the search tree should be biased both in favor of "good" moves and in favor of "important" moves that might or might not be good. We know from lightvector's senior thesis that he did something very similar at the root of sharp. He trained sharp to identify Arimaa moves most like the moves played by experts, and search those more deeply than the others. But he, too, seemed to conceive of it as an alternate evaluation rather than understanding it as fundamentally a way to find moves that are important rather than moves that are good. Reading about MC-LOA has reinforced my strong suspicion that there are gains to be made, either in alpha-beta search or MCTS, via increasing the emphasis on searching moves that are important but not necessarily good. For the authors of MC-LOA I propose a trivial experiment: increase initial the importance of your progressive bias relative to move value (in the search tree portion, not in playouts) and see whether the playing strength increases. I will bet in the dark that this makes MC-LOA stronger. (Yes, I recognize it is outrageous to even suppose the authors haven't already tried this, never mind to predict the outcome.)
|
« Last Edit: Feb 22nd, 2012, 11:23am by Fritzlein » |
IP Logged |
|
|
|
|