Author |
Topic: lightvector's Thesis (Read 5295 times) |
|
Swynndla
Forum Guru
Arimaa player #1821
Posts: 235
|
|
Re: lightvector's Thesis
« Reply #1 on: May 15th, 2011, 10:50pm » |
Quote Modify
|
Thanks!!!
|
|
IP Logged |
|
|
|
UruramTururam
Forum Guru
Arimaa player #2537
Gender:
Posts: 319
|
|
Re: lightvector's Thesis
« Reply #2 on: May 16th, 2011, 7:32am » |
Quote Modify
|
Great! I've printed it out to read in a spare time.
|
|
IP Logged |
Caffa et bucella per attactionem corporum venit ad stomachum meum. BGG Arimaa badges - get your own one!
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: lightvector's Thesis
« Reply #3 on: May 16th, 2011, 12:49pm » |
Quote Modify
|
I look forward to reading it.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: lightvector's Thesis
« Reply #4 on: May 17th, 2011, 1:02pm » |
Quote Modify
|
Thanks very much for making this public, lightvector. Your writing is very clear and readable. I'm sure I will have many questions as I try to understand it, but here's a quick, easy one. On page 9 you say, "the average game length was about 92 turns, or 41 complete turn alternations." That reminds me of the Rocky and Bullwinkle episode in which they said, "after forty days and thirty nights..."
|
|
IP Logged |
|
|
|
Hippo
Forum Guru
Arimaa player #4450
Gender:
Posts: 883
|
|
Re: lightvector's Thesis
« Reply #5 on: May 17th, 2011, 1:30pm » |
Quote Modify
|
Nice reading. I would concentrate myself to fast approximations of the ordering (and incremental computation of it). I would concentrate on iterative deepening with high pruning deep in the tree. It seems to me prunning deeper in tree is more important than pruning in the root. I would think about option to remember the pruned lists for use in next level of iterative deepening. It's a pitty you tested prunning only on 5s games. It would be interesting how it would change on longer time controlls. But Omar must be happy by your approach ... using automated learning to improve itself. May be in the future it would be difficult to learn from humans ... .
|
« Last Edit: May 17th, 2011, 1:37pm by Hippo » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: lightvector's Thesis
« Reply #6 on: May 17th, 2011, 2:27pm » |
Quote Modify
|
After my first pass reading of how expert move prediction was done, I am stunned that it could work at all. Let me check my understanding: You had over 3600 features, and those features contributed additively (in the generalized Bradly-Terry model) to the evaluation of the move. For example, the feature "pushes horse to c5" made a contribution to the value of the move independent of the feature "enemy elephant defends c6 trap"? It seems that all the features that are present contribute equally to the strength of the "team of features"; pairs or groups of features have no meaning. Or did I misunderstand the math? Furthermore, you trained these features on part of the data from only 4121 games? That's a whale of a training method to extract meaningful results from so little data. From Haizhi's failed attempt at training an evaluation function containing tens of thousands of features, I must have drawn incorrect conclusions. I thought that the presence of so many features, with so little data to go on, would inevitably result in overfitting, or what might be called "superstitions" whereby the AI homes in on features that just happened to have been associated with winning moves in a few instances. Like I was listening to Bruce Springsteen just before I got word of my grandfather's death, so I think Springsteen is bad luck. Exact move prediction 12% of the time is astounding; for comparison, in the 2009 Mob game, I predicted the Mob's exact move nine times out of forty, or 22.5%. Has any developer done statistics on how often his bot correctly anticipates the opponent's move on the basis of full search? I am extremely curious as to distribution of the trained weights. Were the weights so different as to make a near hierarchy, e.g. goal >> capture >> trap control >> piece-square-location? If so, then perhaps the fact that the value of the features was merely additive is less of a handicap than I would have thought. Indeed, perhaps a hand-tuned expert move predictor could have outperformed the machine-learned expert move predictor. Did you try compare hand-tuned pruning with machine-learned pruning, as you later compared hand-tuned evaluation with machine-learned evaluation? If I were your competitor in the Computer Championship, I might be tempted to imitate you in point of forward pruning all but 5% of the moves at the root, given the obvious success of such a result, but diverge in point of making my own pruning function by hand, and if I could make the pruning function sufficiently lightweight, applying it recursively as Hippo suggests.
|
« Last Edit: May 17th, 2011, 3:10pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: lightvector's Thesis
« Reply #7 on: May 17th, 2011, 2:55pm » |
Quote Modify
|
In your second pruning round-robin, where you pruned moves without using the expert move predictor, did you use static evaluation to decide which moves to prune, or just prune at random? If you pruned unordered moves (i.e. pruned at random), the result seems pretty meaningless, but if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play. Intuitively, this is a very important result. Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented? What if we need to do forward pruning on a completely different basis than we do evaluation? If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers. Overall, I felt that I was reading (at least) a Master's thesis. It is a bit scary to think what you will come up with next, should you continue to assail the Arimaa Challenge. Congratulations, and thanks again for sharing!
|
« Last Edit: May 17th, 2011, 4:10pm by Fritzlein » |
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: lightvector's Thesis
« Reply #8 on: May 17th, 2011, 7:45pm » |
Quote Modify
|
A minor typo fix for page 26 - under TRAP_STATUS, it is rare to have 4 players in a game.
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: lightvector's Thesis
« Reply #9 on: May 17th, 2011, 7:53pm » |
Quote Modify
|
on May 17th, 2011, 2:55pm, Fritzlein wrote:if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play. Intuitively, this is a very important result. Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented? What if we need to do forward pruning on a completely different basis than we do evaluation? If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers. |
| Please bear in mind that the 2004 version of Gnobot was my first try at a bot, and I really didn't know what I was doing. I don't think it can be taken as much evidence for any particular "result". My pruning was savage (only 10 moves were examined at every node... out of all 17000). And I don't recall doing any direct comparison to a non-pruning version of the bot. (because my algorithms were so slow that I don't think I could make it to 8 steps without pruning!)
|
|
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: lightvector's Thesis
« Reply #10 on: May 17th, 2011, 11:10pm » |
Quote Modify
|
Here's something I don't understand. Surely "players" on a team can have a negative strength? For example, what is the strength of the ALLOWS_GOAL feature? Surely if I have two possible moves that look similar in every other way (i.e. all their other features are identical), but one of the two has ALLOWS_GOAL = 1. Surely it doesn't help to have that extra player! Maybe some of the features need to be turned around so that they are always helpful?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: lightvector's Thesis
« Reply #11 on: May 17th, 2011, 11:26pm » |
Quote Modify
|
on May 17th, 2011, 11:10pm, 99of9 wrote:Here's something I don't understand. Surely "players" on a team can have a negative strength? |
| Good question. Here the math appears to be done before taking logarithms, so instead of the formulas looking like 1/(1+exp(b-a)) or equivalently exp(a)/(exp(a) + exp(b)) as we are used to, the formulas instead look like 1/(1 + B/A) or equivalently A/(A + B) where exp(a) = A; a = log(A); exp(b) = B; b = log(B). Now it doesn't make sense to have negative factors. If A is positive and B is negative, then A/(A + B) is greater than one or negative, whereas probabilities must be between zero and one. Also you can't take logarithms of negative numbers. So I think the answer is that features can't have negative strengths, or else whole teams of features could have negative strengths, resulting in impossible win percentages. However, the A and B are themselves computed as the product of all the features that are present in move A and move B respectively. If a feature isn't present in move A, it doesn't contribute, which is the same as saying it contributes a factor of 1. A bad feature which is present could contribute a factor of 1/2, or 1/1,000,000, severely dragging down the product compared to move B which doesn't have the feature. So the strength of a bad feature isn't a negative number, but after taking logarithms it is. A fraction being multiplied into a team (instead of a non-factor of 1) is the equivalent of a negative elo rating being added into a team (instead of a non-rating of 0). If that makes sense. By the way, the feature "allows opposing goal" shouldn't be infinitely bad. In particular, it had better contribute less than the feature "moves friendly rabbit to goal".
|
« Last Edit: May 17th, 2011, 11:58pm by Fritzlein » |
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: lightvector's Thesis
« Reply #12 on: May 18th, 2011, 1:12am » |
Quote Modify
|
Ok good, I'm glad it's a problem with my reading rather than the method.
|
|
IP Logged |
|
|
|
lightvector
Forum Guru
Arimaa player #2543
Gender:
Posts: 197
|
|
Re: lightvector's Thesis
« Reply #13 on: May 18th, 2011, 1:48am » |
Quote Modify
|
Yes, Fritzlein is spot-on regarding "negative values" for features. on May 17th, 2011, 2:27pm, Fritzlein wrote:After my first pass reading of how expert move prediction was done, I am stunned that it could work at all. Let me check my understanding: You had over 3600 features, and those features contributed additively (in the generalized Bradly-Terry model) to the evaluation of the move. For example, the feature "pushes horse to c5" made a contribution to the value of the move independent of the feature "enemy elephant defends c6 trap"? It seems that all the features that are present contribute equally to the strength of the "team of features"; pairs or groups of features have no meaning. Or did I misunderstand the math? Furthermore, you trained these features on part of the data from only 4121 games? That's a whale of a training method to extract meaningful results from so little data. |
| Fritzlein: Yes, I was very pleased to see that it was so effective. But I also strongly suspected from the beginning it would work at least with some good degree of success. Open up a typical Arimaa game and forget about higher-level strategy, just look at what basic "good" features each move has. Almost all moves will have at least one easily recognizable good feature, and often many. And for moves that aren't so obvious, things like Haizhi's step combo (dependency components) as well as closeness to the previous two moves will help distinguish them from random piece shuffling. It's a decent amount of data. About 100000 training instances, where in each one the learner is taught to distinguish the game move from about 100 random moves. Most features get plenty of positive and negative instances. on May 17th, 2011, 2:55pm, Fritzlein wrote:In your second pruning round-robin, where you pruned moves without using the expert move predictor, did you use static evaluation to decide which moves to prune, or just prune at random? If you pruned unordered moves (i.e. pruned at random), the result seems pretty meaningless, but if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play. Intuitively, this is a very important result. Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented? What if we need to do forward pruning on a completely different basis than we do evaluation? If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers. |
| Pruned at random. This was just a sanity check, basically. I hadn't tested the eval function for ordering/prediction. I expect it to do pretty well at predicting the exact move a lot of the time, perhaps better than either learned predictor, but to have much worse tail behavior. Let's run it now! Here's the prediction stats on a little less than 1/3 of the testing set (127 games, 10500 instances): Top X 1 10.0371 2 14.4367 5 20.4933 10 24.7786 50 40.52 100 48.2716 500 68.9172 1000 77.8497 Top % 5 71.0885 10 80.2685 20 88.6297 40 95.8575 60 98.8573 80 99.5905 So apparently, it's better than Naive Bayes at getting the exact move, or getting it within the top several moves. But as you go further, it gets worse. Also, throughout, it's uniformly worse at predicting than Bradley-Terry. This makes sense. Here's my guess at one reason why something like this might happen: In forward pruning, you're trying to do something a little different than in static evaluation. You want the moves that capture as much as possible of the probability mass for being the best, even if they have a good chance of also backfiring. A lot difficult tactical moves can fall into this category. Such moves the evaluation function may score poorly because if a move opens up a lot of tactics, a static evaluation is going to have no sense of what the true value ultimately is and will misevaluate. However, the move-prediction algorithms might learn that tactical moves that do things like create threats or trade pieces, even if they appear initially questionable, will also contain a decent part of the probability mass for being the best move, larger than one might expect from the static evaluation. on May 17th, 2011, 7:45pm, 99of9 wrote:A minor typo fix for page 26 - under TRAP_STATUS, it is rare to have 4 players in a game. |
| Thanks! on May 17th, 2011, 2:55pm, Fritzlein wrote: Overall, I felt that I was reading (at least) a Master's thesis. It is a bit scary to think what you will come up with next, should you continue to assail the Arimaa Challenge. Congratulations, and thanks again for sharing! |
| Thanks! I'm not out of ideas yet!
|
« Last Edit: May 18th, 2011, 10:50am by lightvector » |
IP Logged |
|
|
|
lightvector
Forum Guru
Arimaa player #2543
Gender:
Posts: 197
|
|
Re: lightvector's Thesis
« Reply #14 on: May 18th, 2011, 2:46am » |
Quote Modify
|
Here's a selection of logarithm'ed feature values. All from gold's perspective. Most common piece for a given piece type assumed (0 stronger = elephant, 1 stronger = camel, etc). Keep in mind that there are tons and tons of correlated features, which will affect things a LOT. In the extreme case, if two features were perfectly correlated, then each would receive only part of the value, so that their sum gives the full value. So a feature's value may be deceptive, depending on which other features frequently occur with it. Movement: Move Elephant off c3: 1.3942674695377 Move Elephant off d5: 0.32809975663243 Move Elephant off d6: -0.32646524296478 Push/pull rabbit at a3 0.95719375501162 Push/pull rabbit at c7 1.4152185138243 Traps: Change # of c3 trap defenders from 1 to 2: 0.2447351795347 Change # of c3 trap defenders from 2 to 1: -0.20043139578781 Capture: Suicide Elephant -4.0458695575478 Suicide Rabbit -3.6913757061848 Capture Camel 4.6433340488424 Capture Rabbit 3.4709199670508 Freezing: Freeze opponent's Camel 0.89834689641162 Freeze opponent's Rabbit (7-8 stronger pieces) 0.060348724742266 Freeze opponent's Rabbit (5-6 stronger pieces) 0.20640037768827 Goal threat: Threaten Goal 1-step goal next turn: 2.6907457101836 Threaten Goal 4-step goal next turn: 1.9281538285566 Goal: Score Goal: 2.503911619406 Move Rabbit to c8: 4.9692384474976 Allow Goal: -6.7643656178349 Dependency structure of move: All steps dependent: 0.66186058142058 Two independent step combos: -0.38533435579265 Four independent steps: -1.7993513726477
|
|
IP Logged |
|
|
|
|