Arimaa Forum - lightvector's Thesis

Welcome, Guest. Please Login or Register.
Jul 18^th, 2025, 7:25pm

Home

Help

Members

Arimaa Forum « lightvector's Thesis »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   lightvector's Thesis

« Previous topic | Next topic »

Pages: 1 2 3

Notify of replies

Send Topic

Author

Topic: lightvector's Thesis (Read 5525 times)

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

lightvector's Thesis
« on: May 15^th, 2011, 10:46pm »

Quote

Modify

I hope this is of interest to others. Let me know if there are any problems accessing it.

http://icosahedral.net/downloads/djwuthesis.pdf

Enjoy!

IP Logged

Swynndla
Forum Guru

Arimaa player #1821

Posts: 235

Re: lightvector's Thesis
« Reply #1 on: May 15^th, 2011, 10:50pm »

Quote

Modify

Thanks!!!

IP Logged

UruramTururam
Forum Guru

Arimaa player #2537

Gender:

Posts: 319

Re: lightvector's Thesis
« Reply #2 on: May 16^th, 2011, 7:32am »

Quote

Modify

Great! I've printed it out to read in a spare time. Cheesy

IP Logged

Caffa et bucella per attactionem corporum venit ad stomachum meum.
BGG Arimaa badges - get your own one!

jdb
Forum Guru

Arimaa player #214

Gender: male

Posts: 682

Re: lightvector's Thesis
« Reply #3 on: May 16^th, 2011, 12:49pm »

Quote

Modify

I look forward to reading it.

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #4 on: May 17^th, 2011, 1:02pm »

Quote

Modify

Thanks very much for making this public, lightvector. Your writing is very clear and readable. I'm sure I will have many questions as I try to understand it, but here's a quick, easy one. On page 9 you say, "the average game length was about 92 turns, or 41 complete turn alternations." That reminds me of the Rocky and Bullwinkle episode in which they said, "after forty days and thirty nights..."

IP Logged

Hippo
Forum Guru

Arimaa player #4450

Gender: male

Posts: 883

Re: lightvector's Thesis
« Reply #5 on: May 17^th, 2011, 1:30pm »

Quote

Modify

Nice reading. I would concentrate myself to fast approximations of the ordering (and incremental computation of it). I would concentrate on iterative deepening with high pruning deep in the tree.
It seems to me prunning deeper in tree is more important than pruning in the root.
I would think about option to remember the pruned lists for use in next level of iterative deepening.

It's a pitty you tested prunning only on 5s games. It would be interesting how it would change on longer time controlls.

But Omar must be happy by your approach ... using automated learning to improve itself. May be in the future it would be difficult to learn from humans ...

« Last Edit: May 17^th, 2011, 1:37pm by Hippo »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #6 on: May 17^th, 2011, 2:27pm »

Quote

Modify

After my first pass reading of how expert move prediction was done, I am stunned that it could work at all. Let me check my understanding: You had over 3600 features, and those features contributed additively (in the generalized Bradly-Terry model) to the evaluation of the move. For example, the feature "pushes horse to c5" made a contribution to the value of the move independent of the feature "enemy elephant defends c6 trap"? It seems that all the features that are present contribute equally to the strength of the "team of features"; pairs or groups of features have no meaning. Or did I misunderstand the math? Furthermore, you trained these features on part of the data from only 4121 games? That's a whale of a training method to extract meaningful results from so little data.

From Haizhi's failed attempt at training an evaluation function containing tens of thousands of features, I must have drawn incorrect conclusions. I thought that the presence of so many features, with so little data to go on, would inevitably result in overfitting, or what might be called "superstitions" whereby the AI homes in on features that just happened to have been associated with winning moves in a few instances. Like I was listening to Bruce Springsteen just before I got word of my grandfather's death, so I think Springsteen is bad luck.

Exact move prediction 12% of the time is astounding; for comparison, in the 2009 Mob game, I predicted the Mob's exact move nine times out of forty, or 22.5%. Has any developer done statistics on how often his bot correctly anticipates the opponent's move on the basis of full search?

I am extremely curious as to distribution of the trained weights. Were the weights so different as to make a near hierarchy, e.g. goal >> capture >> trap control >> piece-square-location? If so, then perhaps the fact that the value of the features was merely additive is less of a handicap than I would have thought.

Indeed, perhaps a hand-tuned expert move predictor could have outperformed the machine-learned expert move predictor. Did you try compare hand-tuned pruning with machine-learned pruning, as you later compared hand-tuned evaluation with machine-learned evaluation? If I were your competitor in the Computer Championship, I might be tempted to imitate you in point of forward pruning all but 5% of the moves at the root, given the obvious success of such a result, but diverge in point of making my own pruning function by hand, and if I could make the pruning function sufficiently lightweight, applying it recursively as Hippo suggests.

« Last Edit: May 17^th, 2011, 3:10pm by Fritzlein »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #7 on: May 17^th, 2011, 2:55pm »

Quote

Modify

In your second pruning round-robin, where you pruned moves without using the expert move predictor, did you use static evaluation to decide which moves to prune, or just prune at random? If you pruned unordered moves (i.e. pruned at random), the result seems pretty meaningless, but if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play.

Intuitively, this is a very important result. Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented? What if we need to do forward pruning on a completely different basis than we do evaluation? If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers.

Overall, I felt that I was reading (at least) a Master's thesis. It is a bit scary to think what you will come up with next, should you continue to assail the Arimaa Challenge. Congratulations, and thanks again for sharing!

« Last Edit: May 17^th, 2011, 4:10pm by Fritzlein »

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: lightvector's Thesis
« Reply #8 on: May 17^th, 2011, 7:45pm »

Quote

Modify

A minor typo fix for page 26 - under TRAP_STATUS, it is rare to have 4 players in a game.

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: lightvector's Thesis
« Reply #9 on: May 17^th, 2011, 7:53pm »

Quote

Modify

on May 17^th, 2011, 2:55pm, Fritzlein wrote:

if you used static evaluation then you have confirmed earlier results (including by 99of9 with GnoBot) that forward pruning on the basis of evaluation results in weaker play.

Intuitively, this is a very important result. Conventional wisdom says that forward pruning is a bad idea, but what if it is a great idea that has simply been badly implemented? What if we need to do forward pruning on a completely different basis than we do evaluation? If that turns out to be a good idea, then it could be a landmark insight in the evolution of alpha-beta searchers.

Please bear in mind that the 2004 version of Gnobot was my first try at a bot, and I really didn't know what I was doing. I don't think it can be taken as much evidence for any particular "result". My pruning was savage (only 10 moves were examined at every node... out of all 17000). And I don't recall doing any direct comparison to a non-pruning version of the bot. (because my algorithms were so slow that I don't think I could make it to 8 steps without pruning!)

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: lightvector's Thesis
« Reply #10 on: May 17^th, 2011, 11:10pm »

Quote

Modify

Here's something I don't understand. Surely "players" on a team can have a negative strength?

For example, what is the strength of the ALLOWS_GOAL feature?

Surely if I have two possible moves that look similar in every other way (i.e. all their other features are identical), but one of the two has ALLOWS_GOAL = 1. Surely it doesn't help to have that extra player!

Maybe some of the features need to be turned around so that they are always helpful?

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #11 on: May 17^th, 2011, 11:26pm »

Quote

Modify

on May 17^th, 2011, 11:10pm, 99of9 wrote:

Here's something I don't understand. Surely "players" on a team can have a negative strength?

Good question. Here the math appears to be done before taking logarithms, so instead of the formulas looking like

1/(1+exp(b-a))

or equivalently

exp(a)/(exp(a) + exp(b))

as we are used to, the formulas instead look like

1/(1 + B/A)

or equivalently

A/(A + B)

where exp(a) = A; a = log(A); exp(b) = B; b = log(B).

Now it doesn't make sense to have negative factors. If A is positive and B is negative, then A/(A + B) is greater than one or negative, whereas probabilities must be between zero and one. Also you can't take logarithms of negative numbers. So I think the answer is that features can't have negative strengths, or else whole teams of features could have negative strengths, resulting in impossible win percentages.

However, the A and B are themselves computed as the product of all the features that are present in move A and move B respectively. If a feature isn't present in move A, it doesn't contribute, which is the same as saying it contributes a factor of 1. A bad feature which is present could contribute a factor of 1/2, or 1/1,000,000, severely dragging down the product compared to move B which doesn't have the feature.

So the strength of a bad feature isn't a negative number, but after taking logarithms it is. A fraction being multiplied into a team (instead of a non-factor of 1) is the equivalent of a negative elo rating being added into a team (instead of a non-rating of 0). If that makes sense.

By the way, the feature "allows opposing goal" shouldn't be infinitely bad. In particular, it had better contribute less than the feature "moves friendly rabbit to goal".

« Last Edit: May 17^th, 2011, 11:58pm by Fritzlein »

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: lightvector's Thesis
« Reply #12 on: May 18^th, 2011, 1:12am »

Quote

Modify

Ok good, I'm glad it's a problem with my reading rather than the method.

IP Logged

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

Re: lightvector's Thesis
« Reply #13 on: May 18^th, 2011, 1:48am »

Quote

Modify

Yes, Fritzlein is spot-on regarding "negative values" for features.

on May 17^th, 2011, 2:27pm, Fritzlein wrote:

Fritzlein: Yes, I was very pleased to see that it was so effective. But I also strongly suspected from the beginning it would work at least with some good degree of success. Open up a typical Arimaa game and forget about higher-level strategy, just look at what basic "good" features each move has. Almost all moves will have at least one easily recognizable good feature, and often many. And for moves that aren't so obvious, things like Haizhi's step combo (dependency components) as well as closeness to the previous two moves will help distinguish them from random piece shuffling.

It's a decent amount of data. About 100000 training instances, where in each one the learner is taught to distinguish the game move from about 100 random moves. Most features get plenty of positive and negative instances.

on May 17^th, 2011, 2:55pm, Fritzlein wrote:

Pruned at random. This was just a sanity check, basically.

I hadn't tested the eval function for ordering/prediction. I expect it to do pretty well at predicting the exact move a lot of the time, perhaps better than either learned predictor, but to have much worse tail behavior.

Let's run it now! Here's the prediction stats on a little less than 1/3 of the testing set (127 games, 10500 instances):

Top X
1 10.0371
2 14.4367
5 20.4933
10 24.7786
50 40.52
100 48.2716
500 68.9172
1000 77.8497
Top %
5 71.0885
10 80.2685
20 88.6297
40 95.8575
60 98.8573
80 99.5905

So apparently, it's better than Naive Bayes at getting the exact move, or getting it within the top several moves. But as you go further, it gets worse. Also, throughout, it's uniformly worse at predicting than Bradley-Terry.

This makes sense. Here's my guess at one reason why something like this might happen:

In forward pruning, you're trying to do something a little different than in static evaluation. You want the moves that capture as much as possible of the probability mass for being the best, even if they have a good chance of also backfiring. A lot difficult tactical moves can fall into this category. Such moves the evaluation function may score poorly because if a move opens up a lot of tactics, a static evaluation is going to have no sense of what the true value ultimately is and will misevaluate. However, the move-prediction algorithms might learn that tactical moves that do things like create threats or trade pieces, even if they appear initially questionable, will also contain a decent part of the probability mass for being the best move, larger than one might expect from the static evaluation.

on May 17^th, 2011, 7:45pm, 99of9 wrote:

A minor typo fix for page 26 - under TRAP_STATUS, it is rare to have 4 players in a game.

Thanks!

on May 17^th, 2011, 2:55pm, Fritzlein wrote:

Overall, I felt that I was reading (at least) a Master's thesis. It is a bit scary to think what you will come up with next, should you continue to assail the Arimaa Challenge. Congratulations, and thanks again for sharing!

Thanks! I'm not out of ideas yet!

« Last Edit: May 18^th, 2011, 10:50am by lightvector »

IP Logged

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

Re: lightvector's Thesis
« Reply #14 on: May 18^th, 2011, 2:46am »

Quote

Modify

Here's a selection of logarithm'ed feature values. All from gold's perspective. Most common piece for a given piece type assumed (0 stronger = elephant, 1 stronger = camel, etc).

Keep in mind that there are tons and tons of correlated features, which will affect things a LOT. In the extreme case, if two features were perfectly correlated, then each would receive only part of the value, so that their sum gives the full value. So a feature's value may be deceptive, depending on which other features frequently occur with it.

Movement:

Move Elephant off c3: 1.3942674695377
Move Elephant off d5: 0.32809975663243
Move Elephant off d6: -0.32646524296478

Push/pull rabbit at a3 0.95719375501162
Push/pull rabbit at c7 1.4152185138243

Traps:

Change # of c3 trap defenders from 1 to 2: 0.2447351795347
Change # of c3 trap defenders from 2 to 1: -0.20043139578781

Capture:

Suicide Elephant -4.0458695575478
Suicide Rabbit -3.6913757061848
Capture Camel 4.6433340488424
Capture Rabbit 3.4709199670508

Freezing:

Freeze opponent's Camel 0.89834689641162
Freeze opponent's Rabbit (7-8 stronger pieces) 0.060348724742266
Freeze opponent's Rabbit (5-6 stronger pieces) 0.20640037768827

Goal threat:

Threaten Goal 1-step goal next turn: 2.6907457101836
Threaten Goal 4-step goal next turn: 1.9281538285566

Goal:

Score Goal: 2.503911619406
Move Rabbit to c8: 4.9692384474976
Allow Goal: -6.7643656178349

Dependency structure of move:

All steps dependent: 0.66186058142058
Two independent step combos: -0.38533435579265
Four independent steps: -1.7993513726477

IP Logged

Pages: 1 2 3

Notify of replies

Send Topic


« Previous topic \| Next topic »