Welcome, Guest. Please Login or Register.
Apr 27th, 2024, 12:10am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Interesting MCTS variant, MCTS-Solver »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   Interesting MCTS variant, MCTS-Solver
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Interesting MCTS variant, MCTS-Solver  (Read 6425 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Interesting MCTS variant, MCTS-Solver
« Reply #15 on: Feb 22nd, 2012, 1:20pm »
Quote Quote Modify Modify

on Feb 22nd, 2012, 11:14am, Fritzlein wrote:
Reading about MC-LOA has reinforced my strong suspicion that there are  gains to be made, either in alpha-beta search or MCTS, via increasing the emphasis on searching moves that are important but not necessarily good.  For the authors of MC-LOA I propose a trivial experiment: increase initial the importance of your progressive bias relative to move value (in the search tree portion, not in playouts) and see whether the playing strength increases.  I will bet in the dark that this makes MC-LOA stronger.  (Yes, I recognize it is outrageous to even suppose the authors haven't already tried this, never mind  to predict the outcome.)

Oh, whoops, the authors did try this, and they reported it.  Increasing the importance of the progressive bias relative to the evaluation of a node within the search tree was one of the differences between MC-LOA and MC-LOA-T, and it did make it stronger.  I'm a day late and a dollar short, as usual.
« Last Edit: Feb 22nd, 2012, 1:20pm by Fritzlein » IP Logged

tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: Interesting MCTS variant, MCTS-Solver
« Reply #16 on: Feb 22nd, 2012, 1:20pm »
Quote Quote Modify Modify

on Feb 22nd, 2012, 11:14am, Fritzlein wrote:

(3) Usually (but not always) MCTS is willing to add nodes to the tree below a child that doesn't have all its siblings, i.e. a position need not be fully expanded before beginning to expand its children, whereas alpha-beta is full-width.

This is also true for alphaBeta to some extent, since forward prunings and extensions, like quiescence, makes the search skip a lot of the continuations.
 
One difference that I can think of is that a move that is bad will get lower and lower process time compared with the good move the more the MCTS is running and building its knowledge about the position.
But in alphaBeta a bad move will more or less close in on a factor times the time on the best move. This will still be true for moves falling for a null move pruning as that search depth will grow at the same speed as the nominal search depth.
IP Logged
Hippo
Forum Guru
*****




Arimaa player #4450

   


Gender: male
Posts: 883
Re: Interesting MCTS variant, MCTS-Solver
« Reply #17 on: Feb 22nd, 2012, 5:14pm »
Quote Quote Modify Modify

I like your summary Fritzlein Smiley.  
Yes MCTS is paradigma allowing a lot of different interpretations. It got initialy name from the random playouts, but it's selection method for driving exploration/exploitation seems gained even higher importance.  
 
Even in go the random playouts with equal move probabilities does not lead to the best results and doing playouts with "trained" move probabilities helps.  
In highly tactical games the "training experts" move probabilities is much more important.
It was discussed in Kozelek's thesis that random attack is much more effective than random defense in Arimaa ... and a general assumption is the random playouts will find correct continuation in highly tactical positions very slowly. I am sure this is case for equal move probabilities, but I hope "training experts" probabilities will speed up the convergence sufficiently.
 
There are 2 essential steps in the mentioned paper. One is cutoff by evaluation function approach. And the other is speeding propagation of proved results.
I do think the second idea could be more developed to change the backpropagation method. Ususally MCTS weights all playouts by the same weight. (Parallel MCTS often adds 1 to the number of simulations at first and backpropagates the win later ... to improve chance to navigate other threads to other branches). But seems to me the longer the playout is the less acurate it is. What about to weight shorter playouts more (those which are terminated by evaluation treshold earlier). That would be compatible with draw evaluation for too long playouts. (Just stop playout when it's weight is too close to zero).
 
Actually I am trying to make the bot to compute more like I think in postals. And I actually do playouts chosing from small amount of interesting moves and play several moves further to gain some "impression". But the longer the playout is the less important the final impression is for me. I am sure I think much closer to MCTS than to alpha-beta.
« Last Edit: Feb 22nd, 2012, 5:16pm by Hippo » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Interesting MCTS variant, MCTS-Solver
« Reply #18 on: Feb 23rd, 2012, 12:09pm »
Quote Quote Modify Modify

on Feb 22nd, 2012, 1:20pm, tize wrote:
One difference that I can think of is that a move that is bad will get lower and lower process time compared with the good move the more the MCTS is running and building its knowledge about the position.
But in alphaBeta a bad move will more or less close in on a factor times the time on the best move. This will still be true for moves falling for a null move pruning as that search depth will grow at the same speed as the nominal search depth.

I don't know the proof behind the correctness of UCT, but my intuitive understanding is that UCT is "correct" in that the probability of exploiting the best move converges to one.  To satisfy that property, yes indeed, there can't be a fixed ratio of CPU time spent between better and worse moves.  The time spent on inferior moves must go to zero for the exploitation of the best move to increase to one.  But is that really better than a fixed ratio?
 
If I am understanding UCT properly, I don't see why the property of being "correct" should correlate to being best in terms of game AI.  Intuitively, I don't want my bot thinking more and more about the best move, to the exclusion of thinking about all other moves.  On the contrary, assuming there is a fixed amount of variation in evaluation per extra ply of search depth, then the exploration/exploitation tradeoff that I want is precisely a fixed ratio of CPU share per unit of difference in current evaluation, i.e. CPU share being logarithmic in evaluation.  And since search depth is proportional to the logarithm of CPU time, this is equivalent to making search depth linearly dependent on current evaluation.
 
Surely this is possible within the exploration/exploitation paradigm, possibly by changing only one term in one place in the code.  Intuitively (again, I'm guessing in the dark) this will make a stronger bot than a exploration/exploitation tradeoff that converges to spending all its CPU examining the best move, stronger because evaluation errors that can be corrected by greater search depth are corrected sooner.
 
At an abstract level, we are not playing slots on a many-armed bandit, where we lose money every time we play the wrong machine.  At the end of the day, we want our probability of determining the best move to go to one, which is a weaker demand than making the probability of  searching the best move go to one.  Of course the latter implies the former, but we can have the former without the latter, for example via alpha-beta search.  It may be that the demand of "correctness" of UCT ultimately handicaps its efficiency.
 
Thus I personally would relax the demands of UCT, and permit asymptotically constant (non-zero) search share of weaker moves, with one example being the fixed-ratio method you refer to.
« Last Edit: Feb 23rd, 2012, 12:34pm by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Interesting MCTS variant, MCTS-Solver
« Reply #19 on: Feb 23rd, 2012, 12:52pm »
Quote Quote Modify Modify

on Feb 22nd, 2012, 5:14pm, Hippo wrote:
I am sure I think much closer to MCTS than to alpha-beta.

I  believe one aspect of expert human thinking that is under-represented in both alpha-beta bots and MCTS bots at present is that humans look at high-risk, high-reward moves with greater preference.  If there is a camel sacrifice that might get me a forced goal, I spend time trying to precisely work out whether the forced win is real or illusory.  An alpha-beta bot fails by quickly seeing the camel loss and throwing the move to the bottom of the heap until the forced goal comes within its search depth, when it suddenly the goal is proved and it bounces to the top.  This takes too long to see, and would be seen earlier with greater emphasis on high-risk, high-reward moves.  Meanwhile an MCTS bot sees the camel sacrifice as generating some wins and some losses depending on the random playouts, so it doesn't throw the move on the rubbish heap (yay!), but proving the forced goal or lack thereof is something that MCTS is inefficient at (boo!), so for a long time it returns mediocre mush instead of a proper evaluation which is in truth either very positive or very negative .  Both kinds of bots need to bias their searches more towards moves with high variance, i.e. more towards moves that could turn out to be great or could turn out to be terrible.
 
I'm not sure how to implement this, but I intuit that sharp's attempt to imitate expert moves based on the presence/absence of features was a huge step in the right direction.
« Last Edit: Feb 23rd, 2012, 12:53pm by Fritzlein » IP Logged

Nazgand
Forum Guru
*****



Arimaa player #6461

   
Email

Gender: male
Posts: 87
Re: Interesting MCTS variant, MCTS-Solver
« Reply #20 on: Feb 23rd, 2012, 6:12pm »
Quote Quote Modify Modify

on Feb 23rd, 2012, 12:52pm, Fritzlein wrote:

 look at high-risk, high-reward moves with greater preference.

If I remember correctly, Houdini does something like this. My memory is fuzzy on this though.
IP Logged
tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: Interesting MCTS variant, MCTS-Solver
« Reply #21 on: Feb 24th, 2012, 1:28pm »
Quote Quote Modify Modify

on Feb 23rd, 2012, 12:09pm, Fritzlein wrote:

Intuitively, I don't want my bot thinking more and more about the best move, to the exclusion of thinking about all other moves.

I agree, but I still would like the bots time spent on bad moves to approach zero instead of a fixed ratio. Maybe the moves that should be investigated the most is the second best moves, that is, just reduce the probability a lot for the best looking move. It mustn't be ignored completely though as it could have gotten a superficial high value. This idea fits your refrasing of the question to solve.
Quote:
At the end of the day, we want our probability of determining the best move to go to one, which is a weaker demand than making the probability of  searching the best move go to one.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Interesting MCTS variant, MCTS-Solver
« Reply #22 on: Feb 25th, 2012, 9:11am »
Quote Quote Modify Modify

on Feb 21st, 2012, 8:27pm, Fritzlein wrote:
This is not more effective use of information that MCTS already had available; instead it is a concession that looking at every move is more valuable than selecting one to examine at random.  If that improves playing strength, maybe it would improve playing strength more to do a 3-ply check for forced wins before doing any playouts, etc.

Update: The authors experimented and discovered that a 3-ply search within playouts was too heavy and decreased strength, but a selective 2-ply search produced a substantial strength gain.
 
http://www.personeel.unimaas.nl/m-winands/documents/Cig2011paper24.pdf
 
It seems like a step toward the convergence of Monte-Carlo and alpha-beta philosophies.  More experiments like this in various games should bring us closer to understanding what works and why it works.  For example, are playouts really essential to MC-LOA-alpha-beta, or is the real magic the exploitation/exploration tradeoff, i.e. the manner in which the search tree is unbalanced?
« Last Edit: Feb 25th, 2012, 9:22am by Fritzlein » IP Logged

ddyer
Forum Guru
*****






   
WWW

Gender: male
Posts: 66
Re: Interesting MCTS variant, MCTS-Solver
« Reply #23 on: Feb 27th, 2012, 1:12pm »
Quote Quote Modify Modify

We know alpha-beta wastes most of it's time methodically investigating obviously worthless lines of play.   MCTS is a different way of managing the overall search.   In some cases MCTS  can be combined with absurdly simple evaluation strategies with stellar results.  A good case in point is "Cookie Disco" I recently added at Boardspace.net, where a pure random playout strategy is very strong. In other cases,  additional tricks and game-specific knowlege have to be applied to get good results; this is the case for Arimaa.  I experimented with MCTS for Arimaa at Boardspace, but the results were very disappointing.
 
The paper presents a grab-bag of methods which, tuned for the game and combined, get to a very good result, which gives me hope that sufficiently clever programming may yet produce good results for Arimaa.
IP Logged

visit my game site: http://www.boardspace.net/
free online abstract strategy games
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.