Welcome, Guest. Please Login or Register.
Apr 29th, 2024, 1:58pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Monte Carlo Search »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   Monte Carlo Search
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Monte Carlo Search  (Read 1537 times)
fotland
Forum Guru
*****



Arimaa player #211

   


Gender: male
Posts: 216
Monte Carlo Search
« on: Aug 14th, 2006, 10:50pm »
Quote Quote Modify Modify

Monte Carlo search seems to work well for computer go, at least on small boards.  It's a deep, less than full width search with somewhat random moves chosen, evaluating moves by their winning percentage.
 
It might work well for Arimaa as well.  Google for "monte carlo go"
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Monte Carlo Search
« Reply #1 on: Aug 15th, 2006, 2:57am »
Quote Quote Modify Modify

Thanks for this post.  I often use Monte Carlo in my Chemistry simulations, so a year or so ago I thought about using it in Arimaa, but I couldn't find previous references to its use in non-probabilistic games.  Therefore I played around in a connect-4 testing ground, but didn't make much headway.  Now that you've pointed us to references on how it works in Go, I may take another look.
« Last Edit: Aug 15th, 2006, 3:08am by 99of9 » IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Monte Carlo Search
« Reply #2 on: Aug 15th, 2006, 2:23pm »
Quote Quote Modify Modify

How quickly could a given Arimaa position be played out a hundred or a thousand times with random moves on both sides?  If it isn't ridiculously expensive, I'd be curious to see how strong a predictor it is for victory, let's say at the midpoint of the game, or ten moves from the end.
 
Janzert, I know you like a programming challenge.  What if you went through the game database, took every game ending in "g" or "m", and did many random playouts from the midpoint?  Then you could see whether FAME or random playout was more strongly correlated with the eventual winner.  Just a thought: it is of course too much work for me Wink
IP Logged

seanick
Forum Guru
*****



SeaNICK

    seanick
Email

Gender: male
Posts: 97
Re: Monte Carlo Search
« Reply #3 on: Aug 15th, 2006, 10:47pm »
Quote Quote Modify Modify

since many of my losses are due to blunders (and some of my wins for that matter, at least against humans) - I doubt that a random playout would have much bearing unless it was very close to the end in terms of game length (with at least one exception: my longest human vs human game, which was down to 1 opponent piece at or near the halfway point (grumble annoying phant anyway grumble))
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Monte Carlo Search
« Reply #4 on: Aug 15th, 2006, 11:24pm »
Quote Quote Modify Modify

Since my last post I've read more about MC Go.  I've always ruled out random-playout ("Expectation Score") as a heuristic for a couple of reasons:
 
1) in random vs random games, rabbits are more important than they should be, and pushing them forward is always beneficial (because pieces are rarely killed, they typically suicide!).
 
2) Material count is a fast way to get a good handle on who is ahead.  To get a similar accuracy (if possible) from random playouts, many games would have to be played - taking much longer than counting pieces.
 
3) Some other forms of knowledge aren't too hard to code either.
 
What I DO think would be useful is not random-playout, but progressive pruning.  At first the program stochastically explores every option, but gradually some become less statistically-beneficial to the player that makes that move, that they start getting tried less and less often, until they are effectively pruned.  I envisage using a normal evaluation function in cooperation with this MC, so that rather than taking averages over final outcomes, you simply take averages over the evaluation functions at a particular depth (much deeper than alpha-beta can currently achieve).
 
I have some ideas on how this can be done, but am unlikely to think further about it until after my evaluation overhaul.
 
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Monte Carlo Search
« Reply #5 on: Aug 16th, 2006, 12:39am »
Quote Quote Modify Modify

Hmm, I don't really expect it would be a very good predictor. But it might be interesting to try out.
 
Also it would be pretty trivially fast to do it using random steps. Using true random moves would be quite a bit slower. Since I think you have to generate all the moves in order to pick a truly random one. It may not make any difference in practice which method was used though.
 
Unfortunately work keeps me pretty busy for the summer so I  don't know that I'll get time to try it anytime soon.
 
Janzert
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Monte Carlo Search
« Reply #6 on: Aug 16th, 2006, 10:43am »
Quote Quote Modify Modify

on Aug 15th, 2006, 11:24pm, 99of9 wrote:
What I DO think would be useful is not random-playout, but progressive pruning.  At first the program stochastically explores every option, but gradually some become less statistically-beneficial to the player that makes that move, that they start getting tried less and less often, until they are effectively pruned.

I'm trying to get my mind around this.  Are some lines searched deeper than others, or is every line extended to a fixed depth?  If the search goes deeper in some lines than in others, I can't figure out how it would work.
 
I am envisioning a search tree that is only partially filled out and is deeper in some places than others.  The search function is semi-random, so a new branch can suddenly sprout at in any unexplored place, not just at the deepest level.  A new branch can sprout  out somewhere in the middle or even at the root: anywhere you haven't looked at all the options yet, right?  So you are constantly jumping around the search tree making it grow, randomly at first, but eventually by looking at more promising lines while not extending less promising ones.
 
One question that springs to mind is how to weight the relative importance of extending the tree at different depths.  Is there some bias that says a new move at the root is worth more in the search than a new move four ply out?  Or is the bias of where to look based only on the evaluation of that node?
 
Quote:
I envisage using a normal evaluation function in cooperation with this MC, so that rather than taking averages over final outcomes, you simply take averages over the evaluation functions at a particular depth (much deeper than alpha-beta can currently achieve).

How would you compare nodes that have been explored to different depths?  It sounds like a node that hasn't been explored just gets a value from the eval function, but a node that has been explored 1 ply deeper gets the average value of its descendants.  That wouldn't be fair, because average response is a poor response, and it inflates the value of a move to expect the opponent to make a poor response.  So maybe I have it all wrong, and every line is explored to the same depth to eliminate that bias.
 
And another question: Why propagate average values back up the tree rather than minimax values?  Even if we are only looking at part of the tree, aren't we more interested in the best move rather than the average move?
 
Indeed, my intuition is to use neither average value nor minimax, but rather an exaggerated minimax.  Given that part of the tree below each node hasn't been explored, we have to expect that the best move probably hasn't yet been found, so we should extrapolate from the distribution of explored moves that a still better move would be discovered in a full-width search.  How much better than the best so far depends on what percentage of the space remains unexplored.
 
Using an extrapolated minimax, if such a thing could be vaguely accurate, would potentially free us from bias caused by one part of the tree being deeper than another, or caused by one branch being bushier than another.  If such bias were were removed, then I could see growing the search tree much more randomly than otherwise.
 
I'm still curious how one would assign the probability of trying something new versus trying something old and deviating later, but if the exaggerated minimax were somewhat sound, then maybe the distribution at each node is key.  If one move at the root seems way better than the rest, then search would be more likely to explore that move and deviate later, but if it seems reasonably likely that the best move has not yet been identified, search should sprout a new branch at the root, or spend more time on a move that is currently running second-best.
 
As usual, forgive me if I am spouting errant nonsense.  IANACS.  (I am not a computer scientist Wink)
IP Logged

99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: Monte Carlo Search
« Reply #7 on: Aug 16th, 2006, 8:11pm »
Quote Quote Modify Modify

Unfortunately this is not a well defined concept, so I've had trouble answering your post.  We are obviously imagining quite different things.
 
Something vaguely like this:
 
Code:

evaluate(position p) {
  if (depth=D_MAX) {
    return (Static_Eval(p), std_deviation=0);
  } else if (never searched p before) {
    return (Static_Eval(p), std_deviation=constant*(D_MAX-depth));
  } else if (only searched p ONCE before) {
    M = generate_all_possible_steps_from_here(p);
    for (m=0; m<M; m++) {
 e[m],sd[m] = evaluate(p+move_m);
    }
    return (average(all e[m]), sample_std_deviation(all e[m]));
  } else (we have evaluated this position and it's successors before) {
    e[M],sd[M] = recall_previous_evaluations_of_all_moves(p);
    e_p = recall_previous_evaluation_of_this_position(p);
    n=num_times_seen_this_position_before(p);
    n++;
    B=choose_branching_factor(???);
    for (b=0; b<B; b++) {
 m[b] = choose_move(e[M]-e_p, sd[M],n,temperature);
 e[m[b]],sd[m[b]] = evaluate(p+move_m[b]);
    }
    return (average(all e[m[b]]), sample_std_deviation(all e[m[b]]));
    }  
  }
}
 
int choose_move (e_increase[M], sd[M], n, temp) {
  e_inc_max, m_max = find_maximum_increase(e_increase[M]);
  for (m=0; m<M; m++) {
    weight[m] = exp((e_increase[m]-e_inc_max) / (temp*(sd[m_max]+sd[m])));
  }
  return( roulette_wheel_choice(weight[M]) );
}

 
Of course this will have numerous illogicalities, because I just made it up.  But that's the rough outline of what I'm imagining.
 
Quote:
One question that springs to mind is how to weight the relative importance of extending the tree at different depths

Every iteration, a tree of size B^D is searched.  Except when some of the nodes had never been visited before... in that case we do less than B^D.  Let's take the example of B=2 at all depths (this is tuneable).  Then at the root node, we always explore 2 possible steps.  In fact if one is clearly above all others relative to the accuracy with which they are known (std_deviation), then we may explore the same step twice (although that would mess up the calculation of a new std_deviation, so some thought needs to go into this).  Near the base of the tree, we've generally explored them ALL before, we just don't know their values very accurately at first.
 
 
Quote:
How would you compare nodes that have been explored to different depths?

All depths of Static_Evaluations are treated equally.  If a move gets an "unfair advantage" from being evaluated at a lower depth, the search will focus on using that move, and hence it will soon be expanded, and it's unfair advantage removed.
 
Quote:
unfair because average response is a poor response, and it inflates the value of a move to expect the opponent to make a poor response

If we make this move often enough in our subsequent iterations, the opponent will soon learn *which* responses are higher eval than others, and will start pruning his/her replies to nearly always respond with the best reply.
 
Quote:
Why propagate average values back up the tree rather than minimax values?

The average is equivalent to the minimax once every player always plays the perfect move at every node Wink.
 
It's true there are other options here.  Another interesting one is:
MAX(x_i) ~ power( sum(x_i^p) , 1/p)
When p is large, this converges to the max function.  There is an equivalent for MIN.
 
Quote:
Given that part of the tree below each node hasn't been explored, we have to expect that the best move probably hasn't yet been found, so we should extrapolate from the distribution of explored moves that a still better move would be discovered in a full-width search.

But bear in mind that not all responses have been tried either.  Maybe you have hit upon the best move, but you haven't yet seen the best response.
 
If your extrapolation was: best+DELTA, then every second ply would cancel out the DELTA from the previous player.
 
I agree if you use a good statistical extrapolation rather than a fixed constant, your method has merit.
 
Quote:
If one move at the root seems way better than the rest, then search would be more likely to explore that move and deviate later, but if it seems reasonably likely that the best move has not yet been identified, search should sprout a new branch at the root, or spend more time on a move that is currently running second-best.

If the estimated value of the current_best root move starts to decline, then soon, the low-accuracy second_best root move will start to overlap with the best, so will sometimes be chosen to try out.  It's accuracy will then improve, and it will either show itself to be truly inferior, or will start to improve it's rough eval, and will start getting chosen regularly.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Monte Carlo Search
« Reply #8 on: Aug 20th, 2006, 10:52am »
Quote Quote Modify Modify

OK, I think I now understand a possible benefit of having some randomness in the search.  You would like to do forward pruning of some type.  Forward pruning can be useful if picking the best move relies more on exploring some lines more deeply, at the expense of exploring other lines more shallowly.  The trouble with forward pruning is that you can get stuck in bad lines even after you have discovered at greater depth that they are not so hot, right?
 
Full-width search fixes the problems of forward pruning.  However, suppose you want to get the benefits of forward pruning without the detriments.  Stochastic search could at least diminish the problems of forward pruning by revisiting previously discarded lines, and by ensuring that a greater variety of lines are explored to a greater depth.
 
Now that I understand (I hope) this concept, I have a different question.  Why not be more systematic about stochastic search, both in order to make it faster, and in order to make the engineering easier?  What I have in mind is something like this:
 
Pretend that you are doing a full-width alpha-beta serach with iterative deepening.  In order to be faster, and to get greater depth in more promising lines, stochastically prune lines with unpromising evals.  That is to say, rather than having a fixed threshold for forward pruning, have a gradually-increasing probability of forward pruning as the local eval gets worse and worse, but always leave some chance of exploring a superficially rotten line.
 
Combine this with iterative deepening, e.g. on the first pass prune about 75% at the root, 90% at depth 1, then 100% (i.e. stop searching) at depth three.  For the next iteration, prune about 50% at the root, 75% at depth 1, 90% at depth three, and stop at depth 4.  I have no idea what the optimal values for the aggressiveness of the pruning should be, but you get the idea.
 
It seems stochastic pruning leaves you less chance of getting stuck than does regular forward pruning.  If there's a huge benefit to be found at greater depth in a line requiring an initial sacrifice, you have some probability of discovering that, because you don't automatically get rid of the lines with low static eval in the first place.  Likewise if there is something rather nasty about to happen in all initially promising lines, you will discover that at some iteration (sooner than full width search would), and then re-introduce lines with greater probability that otherwise would be played with low probability.
 
In short, I think this achieves what you are trying to achieve via stochastic search without introducing the same engineering headaches you were outlining above.  But as usual I may be missing something subtle (or obvious!) about what you were planning to do, as well as problems with what I'm suggesting.
IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Monte Carlo Search
« Reply #9 on: Aug 20th, 2006, 11:14am »
Quote Quote Modify Modify

Quote:
Combine this with iterative deepening, e.g. on the first pass prune about 75% at the root, 90% at depth 1, then 100% (i.e. stop searching) at depth three.  For the next iteration, prune about 50% at the root, 75% at depth 1, 90% at depth three, and stop at depth 4.  I have no idea what the optimal values for the aggressiveness of the pruning should be, but you get the idea.

 
This idea is fine, but, the trick is how to decide which moves to prune. AlphaBeta tells you the exact score of one move and that all the other moves are worse. It does not tell you how much worse they are.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Monte Carlo Search
« Reply #10 on: Aug 20th, 2006, 1:52pm »
Quote Quote Modify Modify

on Aug 20th, 2006, 11:14am, jdb wrote:
AlphaBeta tells you the exact score of one move and that all the other moves are worse. It does not tell you how much worse they are.

Ah, good point.  I forgot that alpha-beta pruning doesn't tell you exactly how bad a bad move is when it reports back up the tree, because it doesn't look for replies that are even more crushing.  Maybe the alpha-beta cutoffs could also be stochastic, for example by adding a small random amount to the amount needed to produce a cutoff.
« Last Edit: Aug 20th, 2006, 1:53pm by Fritzlein » IP Logged

Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.