Welcome, Guest. Please Login or Register.
Oct 31st, 2024, 7:04pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « automated reversal detection in game history »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   automated reversal detection in game history
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: automated reversal detection in game history  (Read 4226 times)
clauchau
Forum Guru
*****



bot Quantum Leapfrog's father

   
WWW

Gender: male
Posts: 145
Re: automated reversal detection in game history
« Reply #15 on: Mar 26th, 2008, 8:28am »
Quote Quote Modify Modify

on Mar 25th, 2008, 12:06pm, Fritzlein wrote:
My prediction is, unfortunately, that anchoring on victory condition only will do nothing more than train the eval to be excessively fond of advanced rabbits, since that is the only thing it can understand.  Starting from scratch, your eval will quickly evolve into arimaa_score.

 
I don't think so and I've been planning to find about it for some times now but didn't quite find the time yet. Here is how I think keeping rabbits on the back row may naturally emerge (provided we specifically look at that feature):
 
1) 1-ply search tons of uniformly-random positions with Gold moving, only knowing the value for gameovers positions, having it 0 otherwise, assuming some Gold's and Silver's materials and their rabbits advancement. Look at the average value. There are too many combinations of the material and advancement features but guess an approximate formula to cover them all based on your sampled observations.
 
At that stage I agree it should favor Gold's and Silver's advanced rabbits. But you now need to reverse your formula so that it actually evaluates positions when Silver is to move - which is the way you have already been evaluating the leaves in the search above.  
 
2) Repeat this with the corrected evaluation function, i.e. positions reached which are not gameover don't get the value 0 any longer, they get the reversed approximate formula guessed on the previous stage. As a result, Gold is going to be in favor of advanced rabbits only on positions where it finds a winning move. On every other positions, non-advanced rabbits are going to be favored because it doesn't want Silver to have a good position!
« Last Edit: Mar 26th, 2008, 10:16am by clauchau » IP Logged
IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: automated reversal detection in game history
« Reply #16 on: Mar 26th, 2008, 9:47pm »
Quote Quote Modify Modify

on Mar 25th, 2008, 9:05am, lightvector wrote:
The main problem I see with any automated assignment is how to decide whether a swing in advantage was the result of good play or one side blundering.
 
Say Silver loses a horse between positions A and B. If the loss was forced, then ideally you would want to assign a score of a horse loss to position A, because the horse is destined to be lost at that point (possibly as a result of a much earlier mistake by Silver).

 
Ahh, but see here you are letting the perfect be the enemy of the good enough.    Let's suppose that the loss of a horse on ply X switched the lead from player w to player b, and b won the game.  Until that point, w had been ahead.   Then your quandary prevents us from knowing with full accuracy whether we should assign b the advantage from X to End or from X-1 to End.
 
And you're right - we probably can't do that perfectly.   But if we can do it at all we can produce a better training set than the alternative, which is to assign b as the winner for all positions in that game from ply 1 to the end - which is clearly wrong in this hypothetical case for all  positions from 1 to X-2, and maybe wrong for X-1.   The possibility of misclassifying ply X-1 shouldn't prevent us from making the attempt that might allow us to correctly classify all positions 1 through X-2.
 
As for whether or not training is worthwhile (which is much of what this discussion has been about so far), I have strong reason to believe it is and will be trying it whether or not Fritz approves.  Smiley  So to that end, I'm simply trying to generate the best training set I can.    
 
I'm leery of approaches that involve the eval itself in generating the training set, even though I suggested that possibility in the initial post, precisely for some of the reasons given in this discussion.  
 
 
 
IP Logged
IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: automated reversal detection in game history
« Reply #17 on: Mar 26th, 2008, 9:53pm »
Quote Quote Modify Modify

on Mar 25th, 2008, 7:32pm, nbarriga wrote:

 
Actually I think there have been some moderate success in attempts at using neural networks as an eval function(ex. see neuroGo). I have high confidence that it can work quite well for arimaa.

 
Likewise, a neural network eval with fairly naive inputs achieved a very high rating in checkers only a few years ago.  And I think the structure of Arimaa actually lends itself much better to an NN approach than does that of Go or checkers.   (Whereas Chess looks almost like it was designed purposefully to foil neural networks).
 
That's not what I'm working on, though, at least not this year.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: automated reversal detection in game history
« Reply #18 on: Mar 27th, 2008, 10:26am »
Quote Quote Modify Modify

on Mar 26th, 2008, 9:47pm, IdahoEv wrote:
Ahh, but see here you are letting the perfect be the enemy of the good enough.    Let's suppose that the loss of a horse on ply X switched the lead from player w to player b, and b won the game.  Until that point, w had been ahead.   Then your quandary prevents us from knowing with full accuracy whether we should assign b the advantage from X to End or from X-1 to End.

I thought at first you were just proposing to use sudden changes in eval to train eval.  My intuition was that if you are using eval to train eval, then using search to link one position to the next is better than using the move the human actually played to link one position to the next.  The perfect is not the enemy of the good in this substitution: you can have the perfect (search) in every instance where you have the good (a human-chosen move).  It doesn't take away any of your ability to train to use search+eval rather than human_move+eval as the feedback.
 
But now I see that something else is going on.  You want to use the information that Silver actually won the game many moves after the abrupt change in your eval.  You want to use actual game results in order to train eval, and the reason for detecting big swings in eval is break the chain that goes from game end to game beginning.  You don't want to assign credit for the final result to a position twenty moves earlier if there was a big blunder ten moves from the end.  Do I understand your undertaking better now?  If so, then we don't have such a big difference of opinion: philosophically I approve of cutting off the distance backwards a game result should be used as feedback.
 
Quote:
As for whether or not training is worthwhile (which is much of what this discussion has been about so far), I have strong reason to believe it is and will be trying it whether or not Fritz approves.  Smiley

Don't you realize that since I am the World Champion of Arimaa that makes me the expert on computer science and artificial intelligence, not to mention microeconomics, nuclear physics, and fashion design?  Trust me on this...  Grin
 
But seriously, I'm thrilled that you are trying an alternative to hand-coded evaluation functions, and I hope you succeed.  It may have come across that I was discouraging automated learning, but I was really trying to put an alternative automated learning method on the table.  Clearly, the odds are that my idea is wack.
IP Logged

IdahoEv
Forum Guru
*****



Arimaa player #1753

   


Gender: male
Posts: 405
Re: automated reversal detection in game history
« Reply #19 on: Mar 28th, 2008, 1:59am »
Quote Quote Modify Modify

on Mar 27th, 2008, 10:26am, Fritzlein wrote:
You want to use actual game results in order to train eval, and the reason for detecting big swings in eval is break the chain that goes from game end to game beginning.  You don't want to assign credit for the final result to a position twenty moves earlier if there was a big blunder ten moves from the end.  Do I understand your undertaking better now?

 
Bingo.
 
Quote:
If so, then we don't have such a big difference of opinion: philosophically I approve of cutting off the distance backwards a game result should be used as feedback.

 
And you can see how this leads to the deepest problem:  how to train an eval that can correctly score positions in the opening.  The further back from the end we are, the less likely it is that knowledge of the eventual winner is a useful input to a learning function.   "The eventual winner is ahead" is probably correct 99% of the time 3 turns before the end, but probably only correct 60% of the time 3 turns after the beginning.  
 
But bots has gotta be able to play in the opening, too.   So what i'd like to do is somehow increase the quality of the training set by correctly classifying earlier states ... which is equivalent to detecting reversals.
 
The further problem, of course, is that classifying those states is exactly what an eval is supposed to do.  So I need an eval to train an eval.   Or I need human expertise.  Or some combination.
IP Logged
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.