Welcome, Guest. Please Login or Register.
Nov 22nd, 2024, 12:01pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Shannon Type B »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   Shannon Type B
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Shannon Type B  (Read 3058 times)
Ciribot
Forum Junior Member
**



Arimaa player #1955

   


Gender: male
Posts: 7
Shannon Type B
« on: Mar 15th, 2009, 9:58pm »
Quote Quote Modify Modify

Are there any engines out there that implement a selective search? By that I mean only searching select probable 'good' moves, and ignoring the rest. There was a large problem with the tactical strength of this type of algorithm in chess, and the need was overcame by better type A algorithms with much better hardware.
 
The problem of course becomes, how do you compress 4000 possibilities into a more managable number?
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Shannon Type B
« Reply #1 on: Mar 16th, 2009, 5:59am »
Quote Quote Modify Modify

Rat does a very selective search.  Rat scored some impressively human-looking victories over Bomb in testing, but in the 2009 Computer Championship, Rat placed eighth out of eight.  So for now Type B is a runner up to Type A.
 
OpFor does late-move reductions.  That is to say, OpFor searches full-width to a shallow depth, but it doesn't bother to deeply search the moves that score poorly at shallow depth.  This hybrid approach blurs the line between Type A and Type B, does it not?
 
I think most of our bots are what you would call Type A, but every technique to search the tree more deeply in some places and less deeply in other places makes the definition less clear.  For example, several bots use null-move pruning to search unpromising moves less deeply, and quiescence search to pursue promising moves more deeply.  How imbalanced does the search tree need to become before we call such a bot Type B?
IP Logged

Ciribot
Forum Junior Member
**



Arimaa player #1955

   


Gender: male
Posts: 7
Re: Shannon Type B
« Reply #2 on: Mar 16th, 2009, 8:07am »
Quote Quote Modify Modify

I may attempt an Arimaa bot sometime soon, I'm currently working on an AI for a board game of my own design (just for practice really).
 
LMR indeed blurs the line a bit. The initial idea of Type B playing was to heuristically determine the moves that could possibly be good, and search only those, and to a great depth.  
 
What if a Monte Carlo tree search, with some Arimaa specific enhancements, decided which moves were 'probably good', and it searched only those to a greater depth, applying the Monte Carlo tree search recursively?
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Shannon Type B
« Reply #3 on: Mar 16th, 2009, 8:29am »
Quote Quote Modify Modify

on Mar 16th, 2009, 8:07am, Ciribot wrote:
What if a Monte Carlo tree search, with some Arimaa specific enhancements, decided which moves were 'probably good', and it searched only those to a greater depth, applying the Monte Carlo tree search recursively?

Adding randomization to the tree-building is certainly very tempting given how UCT has revolutionized Go software lately.  Apparently random playouts of Go positions give a rough estimate of who is winning.
 
Unfortunately, random playouts in Arimaa positions give a terrible estimate of who is winning.  Randomness vastly overvalues rabbits, particularly advanced rabbits, because a random opponent might be dumb enough to let a rabbit through.  We have confirmed that if one random player starts without an elephant and the other random player starts with one less rabbit, the elephant-less player is more likely to win by virtue of having eight rabbits to seven.  This is totally divorced from the reality of high-level play, not to mention beginner play.  Take someone else new to Arimaa, play them a few games with an elephant-for-rabbit handicap, and you'll see what I mean.  In short, random playouts are useless for Arimaa evaluation.
 
If random playouts are not used for evaluation, the remnant of Monte Carlo seems to be playing out randomly to a fixed depth and then using a hard-coded evaluation at that point.  If the evaluation is hard-coded, it is not clear to me what is gained by the randomness.  OpFor's late-move reductions are a deterministic, full-width application of the exploration/exploitation balance.  If you are going to look at better moves more closely and worse moves less closely, and if the determination of what is "better" must be a hard-coded evaluation, then why should randomness improve over a systematic approach?
IP Logged

Hannoskaj
Forum Guru
*****



Arimaa player #3794

   


Gender: male
Posts: 75
Re: Shannon Type B
« Reply #4 on: Mar 17th, 2009, 12:15am »
Quote Quote Modify Modify

Well, let us first state an important point: Monte Carlo is essentially an evaluation function. There was no good, not even remotely good, evaluation function in Go, so there was a major improvement from using it.
 
Second, uniformly random Monte Carlo does not yield a good player, even if less frightening than with Arimaa. The interesting part comes when giving a law on the playouts. If you start with forbidding suicide and taking very frequently hanging pieces, you should get a huge improvement. Find a simple idea to control and chase down rabbits, and it should be enough. If the tree is well-built (next point), you can get away with fairly light playouts. The real point is playing out positions in a way that does not completely destroy them...  
 
Third, the way the tree is built looks fundamental, and MC helps shaping it. It is very important to order moves, using the good evaluation functions that we already have. The easiest is to just look at them in that order. You could also give a number of «free victories» to nodes with good evaluation function, though that might require some tuning.
 
Since we do have good enough static evaluation functions, an alternative is using them after a given number of random moves after the leaf in the tree. (Almost) Best is probably to just use the sign of the evaluation function. The whole procedure is only devising a new evaluation function, and as already mentionned, that's essentially what MC is. That's what is done in InvaderMC, which crushes Invader, at Amazons. They play six random moves and call the evaluation function. With eight, they start getting worse results, and with four even worse. By the way, they would usually make it «five or six», so that the evaluation is called with the same player to move.
  Why does it work ? A first remark is that they get much farther in the tree, and I do not count the random part as part of the tree. Selectivity is better. A second point is that it probably makes the evaluation function more robust. By looking at twenty non-stupid continuations a few moves ahead, you get more resilient moves... At last, I do not know well the subject, and i wonder if they could devise a quiescence search in Amazons. If not, the random part might be only playing this role. By the way, when I speak of calling the evaluation function, I *do* include the quiescence search in it, unless very slow; otherwise, it could just be used to ponder the playouts.
 
In a nutshell, I'd say we might get improvement from these techniques, but it will require much more tuning than for Go.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Shannon Type B
« Reply #5 on: Mar 17th, 2009, 9:26am »
Quote Quote Modify Modify

I would very much like to see diverse methods of creating Arimaa software.  If I understand correctly, clueless is essentially a better bomb than bomb was, rather than a new conception of how a bot should be written.  So while clueless has raised the bar, and I am grateful for that, I would be even more excited by the success of some Monte Carlo method.
 
on Mar 17th, 2009, 12:15am, Hannoskaj wrote:
Second, uniformly random Monte Carlo does not yield a good player, even if less frightening than with Arimaa. The interesting part comes when giving a law on the playouts. If you start with forbidding suicide and taking very frequently hanging pieces, you should get a huge improvement. Find a simple idea to control and chase down rabbits, and it should be enough. If the tree is well-built (next point), you can get away with fairly light playouts. The real point is playing out positions in a way that does not completely destroy them...

But is seems that playing out positions in a way that does not completely destroy them is an immense difficulty.  You are correct that preventing suicide and favoring captures greatly improves over randomness: according to this thread a bot which moves randomly except for maximizing piece count (called S+K-K in the thread) is 311 Elo points better than a random bot.  However, I expect the evaluation from playouts from such a bot to still be abysmal.  The playouts could be improved further still, but the more sophisticated the move generation becomes, the more time it takes as well.  You also need, as you say, a "simple idea to control and chase down rabbits".  I'm all ears.
 
Quote:
Third, the way the tree is built looks fundamental, and MC helps shaping it. It is very important to order moves, using the good evaluation functions that we already have. The easiest is to just look at them in that order. You could also give a number of «free victories» to nodes with good evaluation function, though that might require some tuning.

I am confused.  If we rely on our hand-tuned evaluation to determine which moves deserve to be looked at, how does randomness give us a better idea what to look at?  Why not just use late move reductions?
 
Quote:
Since we do have good enough static evaluation functions, an alternative is using them after a given number of random moves after the leaf in the tree. (Almost) Best is probably to just use the sign of the evaluation function.

Let me see if I understand.  We have a position we wish to statically evaluate.  We have a static evaluation function.  But we think our static evaluation function is bad.  Therefore we improve on the static evaluation by semi random-playouts (guided by application of the static evaluation function to tell us which lines to play out more) and terminate the playouts at some depth, and retain the sign of the static evaluation function as a component of our new evaluation.  This procedure relies on existing static evaluation both in the search and in the new evaluation.  Nevertheless it will be better and faster than conventional methods which use the static evaluation in search for late move reductions and use the static evaluation (not just the sign) at the leaves?  It seems that the improvement can only possibly come from evaluating positions at greater depth, which throws us right back on the question of why the positions at greater depth aren't "wrecked" by the intervening moves.  If we prevent the "wreckage" deterministically (with our static evaluation function) then why shouldn't we instead get just as much greater depth deterministically (with late move reductions)?
 
Quote:
The whole procedure is only devising a new evaluation function, and as already mentioned, that's essentially what MC is. That's what is done in InvaderMC, which crushes Invader, at Amazons. They play six random moves and call the evaluation function. With eight, they start getting worse results, and with four even worse. By the way, they would usually make it «five or six», so that the evaluation is called with the same player to move.
  Why does it work ? A first remark is that they get much farther in the tree, and I do not count the random part as part of the tree. Selectivity is better. A second point is that it probably makes the evaluation function more robust. By looking at twenty non-stupid continuations a few moves ahead, you get more resilient moves

But it seems from your InvaderMC description that they don't have the chicken-and-egg problem.  They don't need the evaluation function to guide the random search; they only need to apply it at the end of the search, and bubble up the feedback as to which lines need to be searched more deeply.  Or do I misunderstand the description?
 
Please note, I am not saying that I think Monte Carlo methods can't work.  I am saying that I don't understand how they will work for Arimaa.  That is less a commentary on Monte Carlo and Arimaa, and more a commentary on my own ignorance.
« Last Edit: Mar 17th, 2009, 9:36am by Fritzlein » IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Shannon Type B
« Reply #6 on: Mar 17th, 2009, 10:52am »
Quote Quote Modify Modify

on Mar 17th, 2009, 9:26am, Fritzlein wrote:

 
I am confused.  If we rely on our hand-tuned evaluation to determine which moves deserve to be looked at, how does randomness give us a better idea what to look at?  Why not just use late move reductions?
 

 
Randomness provides a measure of how robust the position is. If after playing some random moves (maybe only for one side), one can look at how  much the score changes and decide how robust the score is.
 
Humans do a similar thing when playing games too. They play a reasonable line, say a horse exchange. Then they try a bunch of different plans from that position to see what might happen. If they all turn out as expected, then they conclude the horse exchange is reasonable.  
 
Trying a bunch of random plans can be better than selecting deterministically (ie late move reductions) because it avoids systemic blind spots. In an arimaa context, in a goal race there can be unusual looking moves that turn out to be good. A deterministic approach could miss then, where a random one may find it.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Shannon Type B
« Reply #7 on: Mar 17th, 2009, 3:27pm »
Quote Quote Modify Modify

on Mar 17th, 2009, 10:52am, jdb wrote:
Trying a bunch of random plans can be better than selecting deterministically (ie late move reductions) because it avoids systemic blind spots. In an arimaa context, in a goal race there can be unusual looking moves that turn out to be good. A deterministic approach could miss then, where a random one may find it.

OK, I can see how randomness avoids blind spots compared to deterministic late-move reductions.   But I'm not quite sure how we get to claim both the benefit of robustness from randomness and claim that we aren't going to examine a bunch of trash moves that wreck the position because we put in the smarts to only look at good moves.
 
Quote:
Randomness provides a measure of how robust the position is. If after playing some random moves (maybe only for one side), one can look at how  much the score changes and decide how robust the score is.

OK, but doesn't that run into the same problem that random moves may be very biased towards the side with more rabbits and/or more advanced rabbits?  It might be that if you play three random moves for each side in a typical Go position, you haven't changed the score much on average.  But if you play three random moves for each side in a typical Arimaa position, you have helped the attacker and hurt the defender on average.
 
In other words, maybe Arimaa is inherently not robust.  Maybe the set of all fifteen thousand moves looks radically, systematically different from the subset of the best fifteen moves.  Maybe the right way to play Arimaa is constantly dancing along a knife-edge with a position that appears vulnerable, but in reality luring the opponent with false hopes to compromise his position until ultimately all his attacks are shut down and all his pieces are tied in knots.
 
Quote:
Humans do a similar thing when playing games too. They play a reasonable line, say a horse exchange. Then they try a bunch of different plans from that position to see what might happen. If they all turn out as expected, then they conclude the horse exchange is reasonable.

Very true.  That's how I analyze.  I play forward a few lines, making sure to try out several different approaches to get a broader feel for how the initial move works out against a variety of reactions.  So the concept of robustness makes sense to me.
 
But even so, I am selecting from a very small subset of reasonable-looking moves, and hopefully not favoring either myself or the other player.  For the playouts to provide robustness, they must somehow be equally weighted to both players, and not biased to one side or the other.  Since random moves in Arimaa are biased, it could be that any robustness produced by variety is overwhelmed by the inaccuracy of that bias.
« Last Edit: Mar 17th, 2009, 3:31pm by Fritzlein » IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: Shannon Type B
« Reply #8 on: Mar 17th, 2009, 3:58pm »
Quote Quote Modify Modify

I'll try my best to explain how the "random playouts" work in the better monte carlo go programs and maybe it will help to clear things up. I'm not an expert on this by any means, so anyone feel free to correct this.
 
Lets say we have reached a position at the end of the search tree, and it is time to do a monte carlo evaluation of the position.
 
An early attempt was to play uniformly distributed random moves, with some simple rules like not filling eyes.
 
One refinement was to use non-uniformly distributed random moves. This produced an improvement in the quality of the evaluation.
 
Another refinement was to play a few randomly distributed moves, then switch to a deterministic playout. The deterministic playout just tries to select a reasonable move for each side. It doesn't have to be the best, just decent. This provided an increase in playing strength over a strictly deterministic playout.
 
If I understand the post about Invader correctly, it does a similar thing. It plays some moves randomly (in its case 6) , and then calls a deterministic eval function. Which according to the post, improved playing strength.
 
There are alot of different ways to make use of randomness, but I don't know which methods will work best for arimaa.
« Last Edit: Mar 17th, 2009, 3:59pm by jdb » IP Logged
Hannoskaj
Forum Guru
*****



Arimaa player #3794

   


Gender: male
Posts: 75
Re: Shannon Type B
« Reply #9 on: Mar 17th, 2009, 5:03pm »
Quote Quote Modify Modify

Quote:

But it seems from your InvaderMC description that they don't have the chicken-and-egg problem.  They don't need the evaluation function to guide the random search; they only need to apply it at the end of the search, and bubble up the feedback as to which lines need to be searched more deeply.  Or do I misunderstand the description?

 
Yes it does. They use it to order the moves, exactly in the way I've described. They just give an order not any estimate in victories.
 
Before musing further on what you can gain, I'll add a few remarks on the evaluation function. It would probably pay off to use less accurate and faster evaluation functions: you go deeper in the tree, but part of the evaluation is in MC, so you don't lose too much. In the case of InvaderMC, they did get improvements with this trick, whereas the corresponding alpha-beta searcher is weaker than its counterpart with heavier evaluation function.
 
By the way, I said that sign would probably be enough information to keep, but I do think that we could ideally get an improvement, with estimating the winning percentage associated to a given evaluation; but for many reasons, ranging from differences between stable and unstable positions, to differences between beginning and end of the game, it would be very hard to tune.
 
 
About the wreckage of positions, I don't think we would get huge bias on just a few moves. On the other hand, given the quality and speed of current evaluation functions, I seriously doubt that a pure MC-tree search player could accomplish much. If we are to get some improvement, I really expect it from the hybrid approach.
 
An important point is that you go much deeper in your main line(s), much deeper than with late move reduction, that comes too late.  
Imagine you add a node after twenty visits. When you have a move that is looking bad, it won't be visited often except if it ``wins'' its playouts; if it loses them, maybe we end up exploring the move ten times. That's ten evaluation functions. Compare with the 17000 you would get from next stage in the tree. Those evaluations are kept for the interesting lines. MCTS trees are very skewed. Still, it's safer than an equally skewed an alpha-beta tree, as it can always come back to any line, there is no hard pruning.
 
It should be noticed that the best Go programs explore very little nowadays: the trend is to ploughing through the main variation. More explicitely, the first tree searches were based on the UCT formula with an exploration coefficient of 2. Now they probably use something like  0.05.  My guess is that this discrepancy stems from UCB formula having been established only for one level, not for a tree. It's been later applied to trees, as UCT. But you also get information by testing deeper in the tree, so that's also somewhat exploration, and through the main line, it's automatically exploitation.
 
Once a node is inside the tree, MC will converge to minimax. But at shorter timescale, it will ``prefer'' robust moves, unlike alpha-beta. If at the leaves you have only one answer with a good evaluation, alpha-beta will play the corresponding first move, rather than a move with many leaves almost as good, but not quite. It could e excellent if that's an ``only move''. But if it is a mistake in evaluation, because of some horizon effect, for example, you are doomed. MC will play the safe good move, unless it has enough time to explore the good-looking only move.
 
Notice also the human behaviour coming from MCTS, that is, play calm if ahead, try to stage an upset, even if risky, if behind.
 
 
With regards to bias, we might get some. Just look at the incredibly central play of MC go bots. But it's wearing off with them getting stronger. It could e tuned away either through playouts or the tree, or both, but I think that a real answer to that problem cannot be given before this is tried out.
 
 
Jdb is probably right when equating deterministic final playouts to call to a static evaluation function.
By the way, I think that the situation in Arimaa looks more like that in Amazons than that in Go, because we have fast and relatively strong evaluation functions.
 
Sur ce, bonne nuit...
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Shannon Type B
« Reply #10 on: Mar 17th, 2009, 6:32pm »
Quote Quote Modify Modify

Thanks for the explanations, guys.  I look forward to seeing how this approach pans out for Arimaa!
IP Logged

Ciribot
Forum Junior Member
**



Arimaa player #1955

   


Gender: male
Posts: 7
Re: Shannon Type B
« Reply #11 on: Mar 17th, 2009, 7:25pm »
Quote Quote Modify Modify

I have a few questions concerning the MC approach, you said a non-uniform randomness was best, do you therefore mean that 'better' seeming moves get better chances of being done? Also, I assume an Arimaa random-mover would ignore position repeat rules for sake of simplicity. Would the randomness therefore be step-wise, or move-wise? Also, would it be good to use pseudo-randomness based on eg the hash of the current position, so that there is to some extent a search stability?
IP Logged
tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: Shannon Type B
« Reply #12 on: Mar 18th, 2009, 8:04am »
Quote Quote Modify Modify

I am trying this approach with carlos.
 
I use the evaluator from marwin to bias the randomness of the playouts and then evaluate after a playout of 4 plys. While it improves over random playouts to an end state, I've yet to see carlos play stronger than marwin. Although they are using the same evaluator.
 
Not saying that monte carlo is wrong way to build a strong arimaa bot, just that I haven't found the right ingredients for that recipe.
 
Quote:
Also, would it be good to use pseudo-randomness based on eg the hash of the current position, so that there is to some extent a search stability?

I think this will make every playout from a state play identical, as a position will always have the same hashkey.
IP Logged
Hannoskaj
Forum Guru
*****



Arimaa player #3794

   


Gender: male
Posts: 75
Re: Shannon Type B
« Reply #13 on: Mar 21st, 2009, 1:51pm »
Quote Quote Modify Modify

Quote:

I have a few questions concerning the MC approach, you said a non-uniform randomness was best, do you therefore mean that 'better' seeming moves get better chances of being done?  

Yes. In fact, the main point seems rather to avoid catastrophic moves. There is not much information in a loss by elephant suicide. (Thinking aloud: it's almost as if we test the robustness of the position by checking that the evaluation stays good if there are small variations around).
 
Quote:

Also, I assume an Arimaa random-mover would ignore position repeat rules for sake of simplicity.

Clearly. You gain speed, and no step taken should be very rare with random.  
 
Quote:

 Would the randomness therefore be step-wise, or move-wise?  

Quote:

I use the evaluator from marwin to bias the randomness of the playouts and then evaluate after a playout of 4 plys. While it improves over random playouts to an end state, I've yet to see carlos play stronger than marwin. Although they are using the same evaluator.

OK, let us enter the realm of black magic, blind guesses and dark pitfalls.
Tize, when you say that you use marwin's evaluation function for biasing playouts, do you mean you use the information from the function at the node to bias all the playouts, or that you use it at each ply (maybe evaluating all possibilities) ? In the latter case, playouts must be horribly slow ! And you would anyhow see less nodes than with alpha-beta...
About playouts and trees:
* Playouts should be fast.  Even the function in the tree or at the end can be simpler and faster for some gain.
We should also make a clear distinction between the playouts random generation (including its bias) and the a priori shaping of the tree. The latter may be much heavier, since node expansion is less frequent.
The latter is especially true if nodes are expanded only after a given number of visits; if node expansion is a full ply, I'd guess the number should be rather high, maybe 30. If node expansion is by step, of course this number will be lower.  
* Move or step ? In the tree, I would say move, in the playouts, step. Move in the tree because using evaluation function to order possibilities is likely mor reliable at the end of the ply. Steps in playouts because that's much faster, and we probably do not lose much, especially with a small suggestion I give below.
* In Go, one of the main heuristics associated with MonteCarlo is All Moves as First, or more generally Rapid Action Value Estimation. The idea is to count victories/defeats associated to a move played during the playouts, on a separate counter. We then have much data, so little variance; on the other hand bias is big. So that this estimate is the main estimate when a node has not been visited much, and becomes negligible when the node has really been explored.
It relies on the idea that a good move at any moment in a random playout may be a good move now. Since stones stay where they are in Go, and each point can be claimed by either player, it may (and does) work. However I really doubt its efficiency for Arimaa.  
On the other hand, we expect moves that look like a good move to be also good. Hence I think a modification of that idea could be efficient. We still could use a virtual estimation part, with much information, but biased information, and use the same formulas as in http://computer-go.org/pipermail/computer-go/attachments/20080208/6519e9 c5/rave.pdf. To be clear, I emphasize that the following is in the tree, not the playouts.
Two moves that have common steps are similar. I suggest that it be given one fourth virtual victory/defeat to each node *with same parent* that makes the same step (order not taken into account, I think). Add to that 1/20 victory/defeat (to be tuned) for each piece that is moved like in the move we have simulated, but with different arrival. For example, if move tested was Ess Dn re, the move Cn Esen would get .25 + .05 = .3 virtual result.
By the way, it's probably a very bad idea to use this correspondance on grandfather moves: in Go it made things worse, and the explanation is probably that previous good moves block a threat, so that a non-played-in-our-path good move would block the same threat as our previous move. Loss of time.
Another part that can be added to this fast estimation would be the killer heuristics. Give for example half a victory/defeat for each corresponding result with the same step in a son of an uncle.
* I now concentrate more on the playouts.  
For speed, step by step. To have something almost uniformly move-random, it is enough to multiply by two each step with a piece already moved once, by 2*3, each step with a piece moved twice, and 2*3*4 a piece that has made all the former steps. It might be interesting and not too slow to forbid undo last step (beware with push-pull).
As a first remark, I think that Fritz is right that it is easy to bias against defence. I think that whatever the way we weight playouts, the most sophisticated level must be defence. I think that at lowest level, simply forbidding suicide would give good results. On the other hand, if we incite to capture (on that point I think it should only be an incentive, like capture half the time if you can) and no more, the results could be terrible. We should also strongly discourage allowing a new capture. This might make things slow. I think that the best would be to identify capture possibilities with pattern matching, as Fotland does in Bomb notably for quiescence search; if after a move in the playout, a new capture possibility is given to the opponent, roll back that move. I am not a big fan of greedy strategies, so I would rather make these captures and no put in capture only optional: before generating next move in playout, throw four unbalanced coins: one corresponds to enforcing capture if possible without giving new capture, one for enforcing capture if possible anyhow, one to not permitting new enemy capture except if we can also make a new capture the move after, and one to not permitting new capture anyhow. If moves in the quiescence search are still relevant, they could be given some automatic probability of being followed, at least for the player who gained from it.  
About rabbits, if the evaluation function has a way to mark dangerous rabbits, we could multiply the probability to move them in the step generation of playouts. In defense, multiply steps that put a rabbit/piece on the same column, or on its right/left, or a piece in a nearby column; high bonus for pushing the rabbit back. Some other simple ideas could be tried that give a good feel of a non-perfect goal threat handling before the evaluation function is called at the end of playout.  
 
 
 
As a summary, if I wanted to write a MCTS bot, I would first try to get a fast evaluation function with quiescence.
I would build the tree with the following (simple progressive widening) algorithm:
1) Expand root  
For each playout:
1) Follow the tree by taking nodes with highest UCT+RAVE value. At leaf play out for a few moves, then evaluate and propagate back the +/- 1 in the tree (also the RAVE part)
2) When updating nodes in the path that has been followed, check the number of times they have been followed. If more than E_0, do a first expansion; if more than E_0 + k S, do a new partial expansion if possible (maybe limit the number of partial expansions, with very unsafe pruning).
First expansion:
Use evaluation on all sons of the node. Sort them according to this. Put the F best ones in the pool available to UCT. Only nodes in the pool can be followed.
Partial expansion:
Add the following F moves in the pool available to UCT (do not forget what RAVE information we already have. It's probably faster to add it then than when the node is expanded).
 
My guess would be E_0  of order 40, S of order 400, F 5 or 10. To get better ideas I should at least see statistics of how often the final move played is among the first in the order of evaluation.
 
In RAVE: same steps or pieces moved in brother node, same move in cousin node.
 
Playouts: Random step by step, with weight on pieces. No suicide. Maybe incite capture (seen by pattern matching) and defense (reject move in playout that puts a new piece in danger, less strongly if attacking another piece). Maybe incentives for moving dangerous rabbits and defending against them. Call evaluation function after three/four, four/five or five/six plies, for example (always with same player to move).
 
Quote:

Also, would it be good to use pseudo-randomness based on eg the hash of the current position, so that there is to some extent a search stability?

If you want reproducibility for testing results, you might fix the seed and number of simulations, I guess. With a switch.
 
PS: Good contender for the longest post contest...  
 
 
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Shannon Type B
« Reply #14 on: Mar 21st, 2009, 3:44pm »
Quote Quote Modify Modify

on Mar 21st, 2009, 1:51pm, Hannoskaj wrote:
PS: Good contender for the longest post contest...

At some level of detail it would be easier for you to write code to prove your theories than it is to explain them in words!   Wink
IP Logged

Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.