Arimaa Forum - lightvector's Thesis

Welcome, Guest. Please Login or Register.
May 3^rd, 2024, 4:31pm

Home

Help

Members

Arimaa Forum « lightvector's Thesis »

   Arimaa Forum
   Arimaa
   Bot Development (Moderator: supersamu)
   lightvector's Thesis

« Previous topic | Next topic »

Pages: 1 2 3

Notify of replies

Send Topic

Author

Topic: lightvector's Thesis (Read 5105 times)

Hippo
Forum Guru

Arimaa player #4450

Gender: male

Posts: 883

Re: lightvector's Thesis
« Reply #15 on: May 18^th, 2011, 4:15am »

Quote

Modify

on May 18^th, 2011, 1:48am, lightvector wrote:

In forward pruning, you're trying to do something a little different than in static evaluation. You want the moves that capture as much as possible of the probability mass for being the best, even if they have a good chance of also backfiring. A lot difficult tactical moves can fall into this category. Such moves the evaluation function may score poorly because if a move opens up a lot of tactics, a static evaluation is going to have no sense of what the true value ultimately is and will misevaluate. However, the move-prediction algorithms might learn that tactical moves that do things like create threats or trade pieces, even if they appear initially questionable, will also contain a decent part of the probability mass for being the best move, larger than one might expect from the static evaluation.

Good explanation, may be simillar effect would give static evaluation giving range rather than a value ...

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #16 on: May 18^th, 2011, 12:40pm »

Quote

Modify

on May 18^th, 2011, 1:48am, lightvector wrote:

Very lucidly stated! There are many sharp tactical moves which, if they fail, will have a very negative evaluation. For example, I might consider sacrificing my camel for a strong goal attack. If the attack fails, I am down a camel and will lose the game, so giving up my camel is absolutely horrible, one of the worst moves on the board. Yet in spite of the high probability of the move backfiring, it is the type of move that must be examined, because it is unclear in a way that only search will clarify. The go-for-broke attack has a higher probability of being the best move than some random safe move with a higher static evaluation. The safe move, although it has a low probability of being catastrophic, also has a low probability of being brilliant, and thus probably isn't the move you want to play.

Static evaluation simply can't be expected to properly handle sharp moves such as sacrifices. Even if you miraculously managed to give a static weight to the goal attack that roughly balanced the static weight given to losing a camel, your summed evaluation would still be wrong. The exchange is certainly not an equal one, certainly not near a neutral move in value. It is either disastrous or brilliant, not in between, and it must be searched to know which.

Intuitively, late-move reductions might be better at handling this problem than static evaluation, but still poor for the same reason. There will be critical lines which fare poorly on shallow search, but are still critical lines deserving of being searched at least as deeply as the main line. Late move reductions will not give them due attention.

This line of thought reinforces the plausibility of having a completely different pruning function than evaluation function. Both functions should like and dislike some of the same features, such as trap control and greater material, but the pruning function should be asymmetrical. And, voila, your pruning function, by virtue of looking at moves rather than positions, is highly asymmetrical. It emphasizes what you are doing and de-emphasizes what the opponent will do in response.

The pruning function should give bonuses for being sharp, unstable, purposeful, i.e. something that might work even if it might also go horribly wrong. It should also want to examine "extreme" moves more than "average" moves, i.e. moves where a player puts all his eggs in one basket are more important than moves where a player does a little of this and a little of that. The pruning function should want to examine races above all, where both sides are slugging it out. By the same token it should be very eager to prune tame positions where the static evaluation is likely to be nearly as good as the evaluation after searching, and to prune tame moves which, even if they are good, are likely to be dominated by a more committed, purposeful version of the move.

With this strategy in mind it becomes very plausible that aggressive forward pruning is a good idea. Indeed, the only reason I know of to avoid forward pruning is its poor track record (including in chess, not just for the early GnoBot). But if that poor track record is because people couldn't get away from using eval to do pruning, the conceptual obstacle disappears.

In the long run it might turn out that your current, learned pruning function is relatively mediocre compared to what is possible, and the brilliance was merely to do pruning completely independently of how you do evaluation. Does it also matter that you gave moves different features than positions? Does it also matter that you trained the pruning function instead of hand-coding it? This could be analogous to the breakthrough in Go-playing bots which at first appeared to be about UCT exploration/exploitation tradeoff, but later turned out to be all about random playouts being decent at evaluation, quite apart from the other features of UCT, such that now one refers more generally to MCTS instead. One has to experiment with variations on the new theme before one knows what parts were essential and what parts extraneous.

« Last Edit: May 18^th, 2011, 1:56pm by Fritzlein »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #17 on: May 18^th, 2011, 1:47pm »

Quote

Modify

on May 18^th, 2011, 1:48am, lightvector wrote:

Almost all moves will have at least one easily recognizable good feature, and often many. And for moves that aren't so obvious, things like Haizhi's step combo (dependency components) as well as closeness to the previous two moves will help distinguish them from random piece shuffling.

OK, you are helping make it more plausible. It's like the chess adage that a bad plan is better than no plan. The features are a sort of ranking of what things are there to be done, and by picking the highest value thing to be done, you are at least doing something.

I have to get used to understanding that this is pruning not evaluation. A move can be purposeful and still quite weak. The pruning function doesn't need to care if the move is weak; search/evaluation will take care of that later. The pruning function just needs to select which moves are worth looking at, and for that purpose looking for independent features seems entirely reasonable. Indeed, I do it myself all the time, for example looking at a move because it pulls a rabbit and only later rejecting it because of what it allows the opponent to do in response.

Quote:

It's a decent amount of data. About 100000 training instances, where in each one the learner is taught to distinguish the game move from about 100 random moves. Most features get plenty of positive and negative instances.

Hmm, how much data is in one move being preferred to a hundred others? Surely less data than knowing for fifty pairs of moves which move of the pair is preferred. But I take your point that you have somewhat more data than just 100,000 paired comparisons.

Quote:

Pruned at random. This was just a sanity check, basically.

Ah, too bad. Hopefully I can help persuade you to compare different types of pruning functions to your learned pruning function, starting with pruning by static eval and/or pruning by static eval plus quiescence search.

Quote:

Also, throughout, it's uniformly worse at predicting than Bradley-Terry.

OK, since static evaluation is worse at predicting human moves, it would presumably give you less strength gain to prune on the basis of static evaluation than to prune on the basis of trained features. It might even give you a strength loss. Still worth testing to create a baseline.

Quote:

Here's a selection of logarithm'ed feature values.

Thanks for this sampling of actual trained values.

Quote:

Keep in mind that there are tons and tons of correlated features, which will affect things a LOT. In the extreme case, if two features were perfectly correlated, then each would receive only part of the value, so that their sum gives the full value. So a feature's value may be deceptive, depending on which other features frequently occur with it.

Got it. In particular it would look odd that scoring goal is less positive than allowing goal is negative, if not for the fact that moving a rabbit to a given square gets a separate bonus, which is high if that square is on the eighth rank!

By magnitude of features, the hierarchy seems to be

1) Score goal
2) Don't allow goal
3) Capture a piece
4) Don't suicide a piece
5) Threaten immediate goal
6) Haizhi pruning (not four independent steps)
7) Dislodge advanced enemy rabbit
8) Get your elephant off the trap
9) Pull a rabbit
10) Freeze the enemy camel
11) Reposition elephant
12) Protect your home traps
13) Freeze a rabbit

Also from the magnitudes, it doesn't seem likely that several small features will outweigh one big one, i.e. most of the time this function will find the biggest thing that can be done and do it. Or no, again, I have to remember that this function is only used to recommend which moves to look at. It isn't eval. Thus this function finds the biggest thing that can be done and demands that it be examined further. A completely different function decides what to do after that examination. Very plausible, if I can keep it straight in my head.

« Last Edit: May 18^th, 2011, 2:01pm by Fritzlein »

IP Logged

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

Re: lightvector's Thesis
« Reply #18 on: May 18^th, 2011, 2:20pm »

Quote

Modify

Actually, it is quite possible for lots of small features to outweigh big ones, because there are a lot of them. It's not uncommon for a move to have > 20 features.

For example, a typical move where the elephant pushes a piece and then two more pieces are advanced a step might have the following features:

3 moved-own-piece-from-square
3 moved-own-piece-to-square
1 moved-opponent-piece-to-square
1 moved-opponent-piece-to-square
2 changed opponent trap status at traps X and Y
2 changed own trap status at traps X and Z
1 froze opponent's piece of type T
1 froze opponent's at location L
2 advanced piece with certain influence
2 features indicating changes in distance of the advanced pieces from the nearest stronger piece
1 threatens capture of opponent piece
1 defends capture of own piece
1 closeness to last move
1 closeness to last last move
1 haizhi move structure (3 indep components)

That's a lot of features, and depending on how positive or negative they add up to be overall, gives a lot of potential for outweighing large-weighted features in other moves.

Even moves with all 4 steps independent can still get good scores, because if all or most of the 4 steps do things that are very good, then their sum will easily outweigh the penalty.

Other major correlations to keep in mind:
Increase trap defense + move piece to one of the squares adjacent to traps
Approach stronger piece + freeze own piece
Push opponent's piece + make capture threat
Increase trap defense + defend capture threat
Unfreeze piece + defend capture threat
Advance rabbit near opponent's goal + make goal threat
Push opponent's piece adjacent to trap square + decrease opponent trap defense.
Have only 1 feature for each of move-from-square and move-to-square and having 1 dependency component (because the move was of a single piece for 4 steps)

Etc. These are just some of the obvious ones. There are many many more.

The presence of all of these interwoven correlating features would make it quite a pain to weight everything by hand correctly.

« Last Edit: May 18^th, 2011, 2:29pm by lightvector »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #19 on: May 18^th, 2011, 3:01pm »

Quote

Modify

on May 18^th, 2011, 2:20pm, lightvector wrote:

The presence of all of these interwoven correlating features would make it quite a pain to weight everything by hand correctly.

I take your point about the abundance of features and the many correlations between them. It won't necessarily be easy to read off human-comprehensible lessons from a set of thousands of interrelated values. It could be that only a trained algorithm will be able to make sense of the underlying complexity and thereby make good pruning recommendations.

Nevertheless, your conclusion is deeply undermined by the analogy to the evaluation function. A priori, you could make exactly the same case for a machine-learned evaluation function and against a hand-tuned evaluation function (i.e. thousands of interrelated weights which would be bewildering to set properly), but you know for a fact that your hand-tuned evaluation function out-performed your machine-learned evaluation function. Contrariwise, you didn't even try the efficacy of a hand-crafted pruning function that has fewer features and more human-comprehensible ones. I am wholly unpersuaded that it couldn't be done, or would perform worse if done by hand.

I congratulate you heartily on having made a breakthrough in playing strength by novel means. But you most definitely didn't apply machine learning to something that had previously been done by hand, thereby showing that the machine-learned function surpassed the hand-tuned function. Rather you had already innovated merely by using a pruning function which is asymmetric to the evaluation function, with different features, including features capable of recommending sharp, purposeful moves rather than recommending merely good moves. Since nobody had created such a pruning function before in any manner whatsoever, it is entirely unclear whether machine learning is necessary to the task. You introduced multiple innovations at the same time, and my intuition as to which was the most critical part of the innovation obviously differs from yours.

At the end of the day, the intuitions that get pursued are the intuitions of the hard workers who actually put in the elbow grease needed to try things out. By no means let me dissuade you from the research that seems most interesting to you, and most likely to bear fruit. But maybe I can make a strong enough case for my own intuition that you will bend your research direction slightly, if only enough to make a case that I'm looking at the problem all wrong. Wink

« Last Edit: May 18^th, 2011, 3:05pm by Fritzlein »

IP Logged

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

Re: lightvector's Thesis
« Reply #20 on: May 18^th, 2011, 3:45pm »

Quote

Modify

Sure, a hand-tuned function might be able to do better, I agree. I just don't want to deal with it right now, when I'm still rapidly changing things, such as adding new features (more trap-control, multiple-piece relations, piece patterns) as well as removing or simplifying features (to train alternate, lighter versions for use in the tree). The advantage of the learning is that when I add a new category with 128 different features, I don't have to meticulously tune and test the values myself. I just toss them in, and then wait 2 hours, and then I have some excellent weights and can immediately see whether they might be any good.

My feeling is also that this method is limited only by the amount of data and by the features themselves. The model itself is reasonably robust given enough data. And with enough data, the math guarantees that you will get the true "optimal" linear function, and moreover the particular definition of "optimal" here happens to correspond decently well to the practical goal of pruning without losing the best move.

Some informal testing (I forget the numbers) suggests that accuracy does degenerate a little when you further lower the amount of data. But not too much. So if I can mitigate the immediate issues (which in the short-term might just be "buy a few more GB of RAM"), and just train on a somewhat larger proportion of the data, I think the amount of data is okay. There are also hundreds of new high-quality games from the last year that I haven't used, which is a substantial boost, given that I'm basically using only 4000/3 ~= 1300 games of data right now.

If data is no longer a problem, and the algorithm is giving something close to an optimal linear function on that data, then this suggests that the major limitations are the features themselves (which are a bit crude and only include the basics) and the fact that the ordering function itself is linear. The weights themselves not being hand-tuned should not be that big of a deal compared to these two issues.

And you can try to fix the linearity by hand-coding a fundamentally nonlinear evaluation, but would take a lot of testing. While far from a perfect substitute, it's often easier and far faster to test by coding your nonlinear computation separately and then just feed the result in as a new set of features!

« Last Edit: May 18^th, 2011, 4:36pm by lightvector »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #21 on: May 18^th, 2011, 4:48pm »

Quote

Modify

on May 18^th, 2011, 3:45pm, lightvector wrote:

My feeling is also that this method is limited only by the amount of data [...]

And limited by the quality of the data. IdahoEv created a machine-learned material evaluation function, and concluded from his research that an initial cat is worth less than an initial rabbit, based both on the damage done by losing each piece in isolation, and based directly on games in which cat-for-rabbit was the initial trade. We have proof that rabbits are more valuable than cats. Despite this evidence based squarely on the available data, I am unaware of any developer that uses a material evaluation that values a rabbit more highly than a cat as the first capture. In this particular instance, we have collectively decided to trust human intuition more than we trust optimized parameters. This can be justified only in that we mistrust the quality of the data.

Of course, in your case you are explicitly trying to mimic strong players. Since almost everyone values an initial cat more than an initial rabbit, I'm sure your optimized weights do so as well. Even so, there is the potential that your own insights are closer to optimal than insights derived, however optimally, from imperfect data.

Quote:

If data is no longer a problem, and the algorithm is giving something close to an optimal linear function on that data, then this suggests that the major limitations are the features themselves [...] And you can try to fix the linearity by hand-coding non-linear things [...] removing or simplifying features (to train alternate, lighter versions for use in the tree)

OK, so if you are hard-coding the most important inputs to the feature set, the line between a hand-coded function and a machine learned function starts to blur. I guess that is already apparent in the "allows goal" feature; does any other feature have such a high weight as this hand-coded feature which can't be assembled from any combination of more primitive features?

If you additionally try to pare the full feature set, by hand, down to the most important few features for the sake of speed, the line blurs further. This process of speeding up the function also tacitly hard-codes human insights for which the machine learning can't then correct. Ultimately you may end up with a lightweight function where you understand the meaning of all the machine-learned values and can read off insights relatively easily. At that point, it might become tempting to say, "Hmm, I see what the machine is thinking, but my intuition is different. Let me tweak the actual values to be closer to my intuition and see whether that helps playing strength." If ever you succumb to that temptation, even briefly to see what happens, I carry my argument. Wink

But as long as you remain committed to a large and generic set of features, I do take your point about the value of quickly re-optimizing after each change to the feature set. This can otherwise be a serious problem, for example when adding a new feature that is partially, but not entirely, correlated to existing features. How much to weaken the old weights so as to avoid double-counting? And how important is the new feature compared to existing features? I can imagine how the difficulty of changing the feature set is a major drawback of hand-tuned functions, and that having an automated optimization of relative weights is a big plus in that it allows for a large and rapidly-changing feature set.

« Last Edit: May 18^th, 2011, 5:47pm by Fritzlein »

IP Logged

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

Re: lightvector's Thesis
« Reply #22 on: May 18^th, 2011, 7:16pm »

Quote

Modify

on May 18^th, 2011, 4:48pm, Fritzlein wrote:

At that point, it might become tempting to say, "Hmm, I see what the machine is thinking, but my intuition is different. Let me tweak the actual values to be closer to my intuition and see whether that helps playing strength." If ever you succumb to that temptation, even briefly to see what happens, I carry my argument. Wink

Of course I intend to do that! As long as I can get the computer to do most of the slow and tedious work, I'm happy to tweak things where I think it got it wrong. Grin

« Last Edit: May 18^th, 2011, 7:58pm by lightvector »

IP Logged

99of9
Forum Guru

Gnobby's creator (player #314)

Gender:

Posts: 1413

Re: lightvector's Thesis
« Reply #23 on: May 18^th, 2011, 9:12pm »

Quote

Modify

on May 18^th, 2011, 3:01pm, Fritzlein wrote:

you most definitely didn't apply machine learning to something that had previously been done by hand, thereby showing that the machine-learned function surpassed the hand-tuned function. Rather you had already innovated merely by using a pruning function which is asymmetric to the evaluation function, with different features, including features capable of recommending sharp, purposeful moves rather than recommending merely good moves. Since nobody had created such a pruning function before in any manner whatsoever, it is entirely unclear whether machine learning is necessary to the task.

This doesn't prove the issue either way, because it was never compared like for like, but...

Gnobot, and I'm sure many of the other bots have a "hand written" move-ordering function. Although in Gnobot it does not actually do any pruning beyond Haizhi-pruning (it may for other bots), it still affects speed. And obviously it's not equivalent to the eval. Anyway, I agree with the issue you raise Fritz, hand writing may end up better...

Ok, so here's another idea for you lightvector. Could you try adding into your features something like EVAL_TOP_10, EVAL_TOP_100, etc? I'd be interested to know how much of the final weightings they bleed away from the move-centric terms.

« Last Edit: May 18^th, 2011, 10:33pm by 99of9 »

IP Logged

Swynndla
Forum Guru

Arimaa player #1821

Posts: 235

Re: lightvector's Thesis
« Reply #24 on: May 18^th, 2011, 9:32pm »

Quote

Modify

You are kind lightvector, sharing your interesting thoughts for the sake of AI advancement, and in a way that's easy to follow.

on May 18^th, 2011, 12:40pm, Fritzlein wrote:

This line of thought reinforces the plausibility of having a completely different pruning function than evaluation function. Both functions should like and dislike some of the same features, such as trap control and greater material, but the pruning function should be asymmetrical.

Fritz, I'm enjoying the clear explanations you are giving too, and thanks to you I now have an understanding of the difference between evaluating a position & evaluating moves (for ordering & pruning) (I hope I'm not abusing those terms). I can follow everything except "the pruning function should be asymmetrical" ... I don't understand what it means for the pruning function (as opposed to the evaluation function) to be asymmetrical in this context?

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #25 on: May 19^th, 2011, 12:46pm »

Quote

Modify

on May 18^th, 2011, 9:32pm, Swynndla wrote:

I can follow everything except "the pruning function should be asymmetrical" ... I don't understand what it means for the pruning function (as opposed to the evaluation function) to be asymmetrical in this context?

I can see why the term isn't very clear; intuitively there is as much justification for calling it "symmetric" as "asymmetric" Tongue

.

At the time that computer chess engines started to challenge strong humans, humans took a growing interest in anti-computer play. This turned out to be a temporary phenomenon. Normally one is interested in playing the "objectively best" move regardless of the opponent. When computers were weak, and playing objectively good chess was enough to beat them, there was no reason to play anti-computer chess. And similarly, now that computers are better than any of us, there is again no reason to play anti-computer chess. Instead we use computers to help figure out the objectively best move, the move we would like to play against our fellow humans.

For a time, however, anti-computer play was a hot topic. It emerged that humans could play moves that were objectively poor, or at least questionable, and win anyway because the confused computers couldn't see how to take advantage of the strategic characteristics of the position. Some programmers started changing their programs to avoid certain types of positions, not because the positions were bad per se, but because they knew their programs would not be able to understand how to play in those positions.

I remember watching a grandmaster game on the Internet Chess Club. Bob Hyatt was generously giving us lines suggested by his program Crafty. But something interesting emerged. Tacked on to those lines was Crafty's evaluation. It appeared that Crafty thought white was losing by about a pawn, and also that Crafty thought black was losing by about a pawn. When Hyatt was questioned about it, he explained that the central pawn structure was locked, and Crafty knew that it played such positions poorly. Crafty would have been unhappy to play white, and also unhappy to play black, because either way the position had a feature that would put Crafty at a disadvantage.

Normally, if an engine thinks a position is -1.0 for white, it must be +1.0 for black. Objectively it must be so. It's a zero-sum game; both players can't be losing. But Crafty was using what they called asymmetric evaluation, where the evaluations don't have to sum to zero, so white was -1.0 and black was also -1.0.

Maybe I shouldn't have borrowed the term asymmetric from that context to this one. My excuse is that it came to my mind because it applied to bots wanting to avoid certain types of positions, regardless of which side was objectively leading, and also to seek out certain types of positions, regardless of which side was objectively leading. The same now goes for moves. Instead of wanting only to examine objectively good moves, we want to examine also sharp moves that may be objectively bad. Even the direction of the effect is the same as in Crafty's asymmetric evaluation. Crafty avoided locked, slow-changing positions and favored open, fast-changing positions, which is what I now claim pruning should do, even as evaluation remains "symmetric".

Typically in search, you want to spend your time looking at the best moves. Why should I waste time looking at what the opponent will do if I suicide my camel? I'm not going to suicide my camel. It's a waste of time to explore that thought. And recursively I only want to look at the opponent's best replies, etc. I would like to prune out all the bad moves. Pruning out all the bad moves will help me, because I look more deeply at the good moves, on the way to finding the best move.

Unfortunately for pruning, we might accidentally throw out the best move because it superficially looks bad. I might have a forced goal in three that involves suiciding my camel on the first move. If I prune out all camel suicides, then I will throw away the best move without even looking at it, which will mean pruning has weakened the bot. This is why lightvector is going on about capturing the greatest "probability mass" of the best move. We can't know in advance which move will be best, so we have to retain any moves with a significant probability of being best.

One way of guessing which move will turn out best is using static evaluation. But if your static evaluation is symmetric (as I think it should be), then it thinks both players getting a goal attack is the same as neither player getting a goal attack. It cancels out. According to objective evaluation, an equal exchange is just a mediocre, middle-of-the-road move which is probably not best, and therefore can be safely pruned. But this is a lunatic pruning policy: a move which launches a goal race is highly likely to be the best move. Never mind that it is also highly likely to be the worst move; accidentally keeping in the worst move hurts only a tiny bit in lost time. In contrast, throwing away the best move hurts a lot because it creates a blind spot.

Lightvector's features for move prediction allow for asymmetry. They permit offense to be ranked higher than defense. Capturing a horse can be a bigger plus than losing a horse is a minus. Threatening goal can be a bigger plus than allowing a goal threat is a minus. Gaining control of an enemy trap can be a bigger plus than losing control of a home trap is a minus. Etc.

If any of those asymmetries were part of evaluation, the bot would play like a moron, always trying to do something to you while never stopping you from doing something to it. It would play speculatively and unsoundly and be punished for it. But for the purposes of making sure the best move isn't pruned, the asymmetry of favoring offense over defense is brilliant. Speculative lines must be examined. A move that trades horses is far more important to examine than a move that neither captures nor concedes material, even though they may be evaluated the same.

Given this explanation, Swynndla, would you suggest a different term than "asymmetric" to refer to what I am trying to get at? The main point is to throw away as many moves as possible while still capturing all moves that might be best. To do this, one needs to get away from using a function that tries to objectively say who is winning. We can't use symmetric evaluation. But what we are really after is not an evaluation function that is asymmetric in the sense that Crafty's was. We want something else that is a combination of evaluation per se and uncertainty of evaluation. Instead of putting +0.2 ahead of -0.2, we want to put -0.2 plus or minus 7 ahead of +0.2 plus or minus 0.5. The one with the greater uncertainty is more likely to prove best after search. But how does one capture this uncertainty? And what should one name the technique? Offense-first? Sharpness metric? Fog of war barometer?

« Last Edit: May 19^th, 2011, 12:56pm by Fritzlein »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #26 on: May 19^th, 2011, 1:06pm »

Quote

Modify

There's another subtle point I didn't think of at first. Throwing away the best move isn't a big deal if we have retained a similar move that is nearly as good. It might be that Haizhi pruning routinely throws out the best move, but we don't care, because it generally leaves behind a nearly-identical move with two dependent steps, and that can be played instead without much loss.

I think the important metric is not the probability of retaining the best move, but rather how much worse the best move we retained is than the best move overall. Probably this is a harder metric to use, but it could be good to keep it in mind in any case.

IP Logged

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

Re: lightvector's Thesis
« Reply #27 on: May 19^th, 2011, 2:06pm »

Quote

Modify

on May 18^th, 2011, 9:12pm, 99of9 wrote:

I could do this when I have time. But given that the eval is a big sum of different terms for different features, and we suspect that the ordering function may want to weight each feature a little differently, it seems more interesting to me to add each term separately. This amounts to adding "trap control" as a move feature (currently only counts elephant and # of other defenders), as well as some of the eval's terms for things like hostages, blockades, frames, and some more features for things like advanced rabbits, all of which the move features lack right now.

It might be interesting to extend to non-binary features to allow for "continuous" inputs from features whose values are from the evaluation.

Anyways, regarding hand-coding, one can compare it to doing a multidimensional linear regression - you have a model, and an algorithm, like least-squares, that does a great job of finding the best fit. Likely you can out-do it by hand if you spent enough time, but the biggest work/improvement ratio will come from other things. One would prefer to do things like find ways to remove the outliers in the data (filter for blunders, missed goals, lemming captures), and transform the data to eliminate nonlinearities (adding new features). Unless you think the model is untrustworthy and a bad approximation to the task (which you might

), reproducing all of the numeric optimization by hand is a poor work/improvement ratio so long as there are other things to be done.

Indeed, there's no way that the existing, very-basic feature set I'm using gives enough information to distinguish the best move that much more accurately than it already does now. I'd spend a lot of time adding and testing new features anyways when hand-coding, so might as well just do it with the learning algorithm in-place, because it does a pretty good job and testing is much much easier.

IP Logged

lightvector
Forum Guru

Arimaa player #2543

Gender: male

Posts: 197

Re: lightvector's Thesis
« Reply #28 on: May 19^th, 2011, 2:15pm »

Quote

Modify

on May 19^th, 2011, 1:06pm, Fritzlein wrote:

Yep, definitely.

You sort-of get this as a byproduct too if you're using game records. Because human players certainly don't play optimally, and strong players will disagree in many positions about the best move. So to do well, you have to rank a lot of the good moves highly, because there's no telling which one the human player will pick. Unfortunately, this doesn't stop you from consistently throwing out a certain category of good moves.

I wonder if you could detect different "playing styles" for different players. It would be hard to get enough data, but if possible, could be cool.

By the way, it would be interesting to try what Fritz suggests regarding asymmetric pruning. The current learning algorithm does this in some ways. For example, passive piece-shuffling moves one's own side of the board are get slightly lower weights because people don't usually play them, even if they get safe evaluations. But it doesn't do it in other ways. If players regularly defend as often as they trade or race, then the learning algorithm would learn the same, and not favor exploring of racing higher.

I'm enjoying this discussion. I'm glad my senior thesis could spawn such a thread, and I'm grateful to everyone for providing input.

« Last Edit: May 19^th, 2011, 2:27pm by lightvector »

IP Logged

Fritzlein
Forum Guru

Arimaa player #706

Gender:

Posts: 5928

Re: lightvector's Thesis
« Reply #29 on: May 19^th, 2011, 5:08pm »

Quote

Modify

on May 19^th, 2011, 2:15pm, lightvector wrote:

If players regularly defend as often as they trade or race, then the learning algorithm would learn the same, and not favor exploring of racing higher.

I am not quite sure how this would play out. Your feature set is asymmetric. You have a feature for "push enemy camel to b3" but no feature for "allow my camel to be pushed to b6", so the defensive feature can't be given any weight, never mind a negative weight equal and opposite to the weight of the offensive feature. Similarly you have a feature "advance rabbit to a7" but no feature "prevent opposing rabbit from advancing to a2". The piece-square features by nature look only at what you can do and not at what the opponent can do back to you.

For capture you included defensive features as well as offensive features, but it still isn't symmetrical. You have the feature of immediately capturing, allowing a capture next ply, and threatening a capture on the ply after that. The offense gets one more ply than the defense in a capture race, i.e. a move can give me credit for being able to capture in a way that threatens another capture, but my opponent can't get this double credit, even if the race is entirely equal.

Again for goals the offense gets an extra ply. I can get credit for immediate goal, my opponent can get credit for goal the ply after that to cancel it out, but only I can get credit for a goal threat the following ply. There is no feature, "allows opponent to make goal threat".

Won't the inherent asymmetry of the features naturally bias the move selection to sharp moves? If the move selector can't see the counter-attack, it will at least greedily grab the attack, thus generally overweighting sharp moves compared to tame ones.

Quote:

I'm enjoying this discussion. I'm glad my senior thesis could spawn such a thread, and I'm grateful to everyone for providing input.

I'm grateful to you for sharing the thesis per se, and for participating in ongoing discussion about it. I hope my ideas are somewhat useful to you, since I am having great fun cranking them out.

« Last Edit: May 19^th, 2011, 5:44pm by Fritzlein »

IP Logged

Pages: 1 2 3

Notify of replies

Send Topic


« Previous topic \| Next topic »