Welcome, Guest. Please Login or Register.
May 3rd, 2024, 9:55pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « lightvector's Thesis »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   lightvector's Thesis
« Previous topic | Next topic »
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: lightvector's Thesis  (Read 5107 times)
lightvector
Forum Guru
*****



Arimaa player #2543

   


Gender: male
Posts: 197
lightvector's Thesis
« on: May 15th, 2011, 10:46pm »
Quote Quote Modify Modify

I hope this is of interest to others. Let me know if there are any problems accessing it.
 
http://icosahedral.net/downloads/djwuthesis.pdf
 
Enjoy!
IP Logged
Swynndla
Forum Guru
*****



Arimaa player #1821

   


Posts: 235
Re: lightvector's Thesis
« Reply #1 on: May 15th, 2011, 10:50pm »
Quote Quote Modify Modify

Thanks!!!
IP Logged
UruramTururam
Forum Guru
*****




Arimaa player #2537

   
WWW

Gender: male
Posts: 319
Re: lightvector's Thesis
« Reply #2 on: May 16th, 2011, 7:32am »
Quote Quote Modify Modify

Great! I've printed it out to read in a spare time.  Cheesy
IP Logged

Caffa et bucella per attactionem corporum venit ad stomachum meum.
BGG Arimaa badges - get your own one!
jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: lightvector's Thesis
« Reply #3 on: May 16th, 2011, 12:49pm »
Quote Quote Modify Modify

I look forward to reading it.
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: lightvector's Thesis
« Reply #4 on: May 17th, 2011, 1:02pm »
Quote Quote Modify Modify

Thanks very much for making this public, lightvector.  Your writing is very clear and readable.  I'm sure I will have many questions as I try to understand it, but here's a quick, easy one.  On page 9 you say, "the average game length was about 92 turns, or 41 complete turn alternations."  That reminds me of the Rocky and Bullwinkle episode in which they said, "after forty days and thirty nights..."  Smiley
IP Logged

Hippo
Forum Guru
*****




Arimaa player #4450

   


Gender: male
Posts: 883
Re: lightvector's Thesis
« Reply #5 on: May 17th, 2011, 1:30pm »
Quote Quote Modify Modify

Nice reading. I would concentrate myself to fast approximations of the ordering (and incremental computation of it). I would concentrate on iterative deepening with high pruning deep in the tree.
It seems to me prunning deeper in tree is more important than pruning in the root.
I would think about option to remember the pruned lists for use in next level of iterative deepening.
 
It's a pitty you tested prunning only on 5s games. It would be interesting how it would change on longer time controlls.
 
But Omar must be happy by your approach ... using automated learning to improve itself. May be in the future it would be difficult to learn from humans ... Smiley.
« Last Edit: May 17th, 2011, 1:37pm by Hippo » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: lightvector's Thesis
« Reply #6 on: May 17th, 2011, 2:27pm »
Quote Quote Modify Modify

After my first pass reading of how expert move prediction was done, I am stunned that it could work at all.  Let me check my understanding: You had over 3600 features, and those features contributed additively (in the generalized Bradly-Terry model) to the evaluation of the move.  For example, the feature "pushes horse to c5" made a contribution to the value of the move independent of the feature "enemy elephant defends c6 trap"?  It seems that all the features that are present contribute equally to the strength of the "team of features"; pairs or groups of features have no meaning.  Or did I misunderstand the math?  Furthermore, you trained these features on part of the data from only 4121 games?  That's a whale of a training method to extract meaningful results from so little data.
 
From Haizhi's failed attempt at training an evaluation function containing tens of thousands of features, I must have drawn incorrect conclusions.  I thought that the presence of so many features, with so little data to go on, would inevitably result in overfitting, or what might be called "superstitions" whereby the AI homes in on features that just happened to have been associated with winning moves in a few instances.  Like I was listening to Bruce Springsteen just before I got word of my grandfather's death, so I think Springsteen is bad luck.
 
Exact move prediction 12% of the time is astounding; for comparison, in the 2009 Mob game, I predicted the Mob's exact move nine times out of forty, or 22.5%.  Has any developer done statistics on how often his bot correctly anticipates the opponent's move on the basis of full search?
 
I am extremely curious as to distribution of the trained weights.  Were the weights so different as to make a near hierarchy, e.g. goal >> capture >> trap control >> piece-square-location?  If so, then perhaps the fact that the value of the features was merely additive is less of a handicap than I would have thought.
 
Indeed, perhaps a hand-tuned expert move predictor could have outperformed the machine-learned expert move predictor.  Did you try compare hand-tuned pruning with machine-learned pruning, as you later compared hand-tuned evaluation with machine-learned evaluation?  If I were your competitor in the Computer Championship, I might be tempted to imitate you in point of forward pruning all but 5% of the moves at the root, given the obvious success of such a result, but diverge in point of making my own pruning function by hand, and if I could make the pruning function sufficiently lightweight, applying it recursively as Hippo suggests.
« Last Edit: May 17th, 2011, 3:10pm by Fritzlein » IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: lightvector's Thesis
« Reply #7 on: May 17th, 2011, 2:55pm »
Quote Quote Modify Modify

In your second pruning round-robin, where you pruned moves without using the expert move predictor, did you use static evaluation to decide which moves to prune, or just prune at random?  If you pruned unordered moves (i.e. pruned at random), the result seems pretty meaningless, but if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play.
 
Intuitively, this is a very important result.  Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented?  What if we need to do forward pruning on a completely different basis than we do evaluation?  If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers.
 
Overall, I felt that I was reading (at least) a Master's thesis.  It is a bit scary to think what you will come up with next, should you continue to assail the Arimaa Challenge.  Congratulations, and thanks again for sharing!
« Last Edit: May 17th, 2011, 4:10pm by Fritzlein » IP Logged

99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: lightvector's Thesis
« Reply #8 on: May 17th, 2011, 7:45pm »
Quote Quote Modify Modify

A minor typo fix for page 26 - under TRAP_STATUS, it is rare to have 4 players in a game.
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: lightvector's Thesis
« Reply #9 on: May 17th, 2011, 7:53pm »
Quote Quote Modify Modify

on May 17th, 2011, 2:55pm, Fritzlein wrote:
if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play.
 
Intuitively, this is a very important result.  Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented?  What if we need to do forward pruning on a completely different basis than we do evaluation?  If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers.

 
Please bear in mind that the 2004 version of Gnobot was my first try at a bot, and I really didn't know what I was doing.  I don't think it can be taken as much evidence for any particular "result".  My pruning was savage (only 10 moves were examined at every node... out of all 17000).  And I don't recall doing any direct comparison to a non-pruning version of the bot. (because my algorithms were so slow that I don't think I could make it to 8 steps without pruning!)
IP Logged
99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: lightvector's Thesis
« Reply #10 on: May 17th, 2011, 11:10pm »
Quote Quote Modify Modify

Here's something I don't understand.  Surely "players" on a team can have a negative strength?
 
For example, what is the strength of the ALLOWS_GOAL feature?
 
Surely if I have two possible moves that look similar in every other way (i.e. all their other features are identical), but one of the two has ALLOWS_GOAL = 1.  Surely it doesn't help to have that extra player!
 
Maybe some of the features need to be turned around so that they are always helpful?
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: lightvector's Thesis
« Reply #11 on: May 17th, 2011, 11:26pm »
Quote Quote Modify Modify

on May 17th, 2011, 11:10pm, 99of9 wrote:
Here's something I don't understand.  Surely "players" on a team can have a negative strength?

Good question.  Here the math appears to be done before taking logarithms, so instead of the formulas looking like
 
1/(1+exp(b-a))
 
or equivalently  
 
exp(a)/(exp(a) + exp(b))
 
as we are used to, the formulas instead look like
 
1/(1 + B/A)
 
or equivalently
 
A/(A + B)
 
where exp(a) = A; a = log(A); exp(b) = B; b = log(B).
 
Now it doesn't make sense to have negative factors.  If A is positive and B is negative, then A/(A + B) is greater than one or negative, whereas probabilities must be between zero and one.  Also you can't take logarithms of negative numbers.  So I think the answer is that features can't have negative strengths, or else whole teams of features could have negative strengths, resulting in impossible win percentages.
 
However, the A and B are themselves computed as the product of all the features that are present in move A and move B respectively.  If a feature isn't present in move A, it doesn't contribute, which is the same as saying it contributes a factor of 1.  A bad feature which is present could contribute a factor of 1/2, or 1/1,000,000, severely dragging down the product compared to move B which doesn't have the feature.
 
So the strength of a bad feature isn't a negative number, but after taking logarithms it is.  A fraction being multiplied into a team (instead of a non-factor of 1) is the equivalent of a negative elo rating being added into a team (instead of a non-rating of 0).  If that makes sense.
 
By the way, the feature "allows opposing goal" shouldn't be infinitely bad.  In particular, it had better contribute less than the feature "moves friendly rabbit to goal".  Smiley
« Last Edit: May 17th, 2011, 11:58pm by Fritzlein » IP Logged

99of9
Forum Guru
*****




Gnobby's creator (player #314)

  toby_hudson  


Gender: male
Posts: 1413
Re: lightvector's Thesis
« Reply #12 on: May 18th, 2011, 1:12am »
Quote Quote Modify Modify

Ok good, I'm glad it's a problem with my reading rather than the method.
IP Logged
lightvector
Forum Guru
*****



Arimaa player #2543

   


Gender: male
Posts: 197
Re: lightvector's Thesis
« Reply #13 on: May 18th, 2011, 1:48am »
Quote Quote Modify Modify

Yes, Fritzlein is spot-on regarding "negative values" for features.
 
on May 17th, 2011, 2:27pm, Fritzlein wrote:
After my first pass reading of how expert move prediction was done, I am stunned that it could work at all.  Let me check my understanding: You had over 3600 features, and those features contributed additively (in the generalized Bradly-Terry model) to the evaluation of the move.  For example, the feature "pushes horse to c5" made a contribution to the value of the move independent of the feature "enemy elephant defends c6 trap"?  It seems that all the features that are present contribute equally to the strength of the "team of features"; pairs or groups of features have no meaning.  Or did I misunderstand the math?  Furthermore, you trained these features on part of the data from only 4121 games?  That's a whale of a training method to extract meaningful results from so little data.

 
Fritzlein: Yes, I was very pleased to see that it was so effective. But I also strongly suspected from the beginning it would work at least with some good degree of success. Open up a typical Arimaa game and forget about higher-level strategy, just look at what basic "good" features each move has. Almost all moves will have at least one easily recognizable good feature, and often many. And for moves that aren't so obvious, things like Haizhi's step combo (dependency components) as well as closeness to the previous two moves will help distinguish them from random piece shuffling.
 
It's a decent amount of data. About 100000 training instances, where in each one the learner is taught to distinguish the game move from about 100 random moves. Most features get plenty of positive and negative instances.
 
on May 17th, 2011, 2:55pm, Fritzlein wrote:
In your second pruning round-robin, where you pruned moves without using the expert move predictor, did you use static evaluation to decide which moves to prune, or just prune at random?  If you pruned unordered moves (i.e. pruned at random), the result seems pretty meaningless, but if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play.
 
Intuitively, this is a very important result.  Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented?  What if we need to do forward pruning on a completely different basis than we do evaluation?  If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers.

 
Pruned at random. This was just a sanity check, basically.  
 
I hadn't tested the eval function for ordering/prediction. I expect it to do pretty well at predicting the exact move a lot of the time, perhaps better than either learned predictor, but to have much worse tail behavior.
 
Let's run it now!  Here's the prediction stats on a little less than 1/3 of the testing set (127 games, 10500 instances):
 
Top X
1       10.0371
2       14.4367
5       20.4933
10       24.7786
50       40.52
100       48.2716
500       68.9172
1000       77.8497
Top %
 5       71.0885
10       80.2685
20       88.6297
40       95.8575
60       98.8573
80       99.5905
 
So apparently, it's better than Naive Bayes at getting the exact move, or getting it within the top several moves. But as you go further, it gets worse. Also, throughout, it's uniformly worse at predicting than Bradley-Terry.
 
This makes sense. Here's my guess at one reason why something like this might happen:  
 
In forward pruning, you're trying to do something a little different than in static evaluation. You want the moves that capture as much as possible of the probability mass for being the best, even if they have a good chance of also backfiring. A lot difficult tactical moves can fall into this category. Such moves the evaluation function may score poorly because if a move opens up a lot of tactics, a static evaluation is going to have no sense of what the true value ultimately is and will misevaluate. However, the move-prediction algorithms might learn that tactical moves that do things like create threats or trade pieces, even if they appear initially questionable, will also contain a decent part of the probability mass for being the best move, larger than one might expect from the static evaluation.
 
on May 17th, 2011, 7:45pm, 99of9 wrote:
A minor typo fix for page 26 - under TRAP_STATUS, it is rare to have 4 players in a game.

 
Thanks!
 
on May 17th, 2011, 2:55pm, Fritzlein wrote:

Overall, I felt that I was reading (at least) a Master's thesis.  It is a bit scary to think what you will come up with next, should you continue to assail the Arimaa Challenge.  Congratulations, and thanks again for sharing!

 
Thanks! I'm not out of ideas yet!
 
« Last Edit: May 18th, 2011, 10:50am by lightvector » IP Logged
lightvector
Forum Guru
*****



Arimaa player #2543

   


Gender: male
Posts: 197
Re: lightvector's Thesis
« Reply #14 on: May 18th, 2011, 2:46am »
Quote Quote Modify Modify

Here's a selection of logarithm'ed feature values. All from gold's perspective. Most common piece for a given piece type assumed (0 stronger = elephant, 1 stronger = camel, etc).
 
Keep in mind that there are tons and tons of correlated features, which will affect things a LOT. In the extreme case, if two features were perfectly correlated, then each would receive only part of the value, so that their sum gives the full value. So a feature's value may be deceptive, depending on which other features frequently occur with it.  
 
Movement:
 
Move Elephant off c3: 1.3942674695377
Move Elephant off d5: 0.32809975663243
Move Elephant off d6: -0.32646524296478
 
Push/pull rabbit at a3 0.95719375501162
Push/pull rabbit at c7 1.4152185138243
 
Traps:
 
Change # of c3 trap defenders from 1 to 2: 0.2447351795347
Change # of c3 trap defenders from 2 to 1: -0.20043139578781
 
Capture:
 
Suicide Elephant -4.0458695575478
Suicide Rabbit -3.6913757061848
Capture Camel 4.6433340488424
Capture Rabbit 3.4709199670508
 
Freezing:
 
Freeze opponent's Camel 0.89834689641162
Freeze opponent's Rabbit (7-8 stronger pieces) 0.060348724742266
Freeze opponent's Rabbit (5-6 stronger pieces) 0.20640037768827
 
Goal threat:
 
Threaten Goal 1-step goal next turn: 2.6907457101836
Threaten Goal 4-step goal next turn: 1.9281538285566
 
Goal:
 
Score Goal: 2.503911619406
Move Rabbit to c8: 4.9692384474976
Allow Goal: -6.7643656178349
 
Dependency structure of move:
 
All steps dependent: 0.66186058142058
Two independent step combos: -0.38533435579265
Four independent steps: -1.7993513726477
 
 
IP Logged
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.