Welcome, Guest. Please Login or Register.
Apr 19th, 2024, 5:24am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « (no) absolute score values for pieces? »


   Arimaa Forum
   Arimaa
   Bot Development
(Moderator: supersamu)
   (no) absolute score values for pieces?
« No topic | Next topic »
Pages: 1 ... 6 7 8 9  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: (no) absolute score values for pieces?  (Read 38179 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: (no) absolute score values for pieces?
« Reply #105 on: Aug 2nd, 2010, 7:28pm »
Quote Quote Modify Modify

Very interesting, JDB.  Thanks for sharing.  I recall that I once proposed that two rabbits would always be worth more than a cat.  Aaaa suggested EC6R vs E8R as a possible counter-example, and it seem that he was correct.  It doesn't flip to the two rabbits being more valuable until three more rabbits have been exchanged.
IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: (no) absolute score values for pieces?
« Reply #106 on: Aug 5th, 2010, 7:20pm »
Quote Quote Modify Modify

I added dogs to the handicap matches. It will take a couple weeks to get enough games to cover all the cases.
IP Logged
jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: (no) absolute score values for pieces?
« Reply #107 on: Aug 9th, 2010, 3:40pm »
Quote Quote Modify Modify

Another round of testing. This time DC vs D vs CC.
 
The relative value of each pair depends greatly on the number of rabbits remaining.  
 
Code:

Rank Name    .     .  Elo    +    - games score oppo. draws  
   1 Clueless_EDC8R   794   44   41   774   91%     9    0%  
   2 Clueless_ECC8R   730   34   32  1227   91%   -45    0%  
   3 Clueless_EDC7R   635   38   36   758   83%    14    0%  
   4 Clueless_EDC6R   597   37   36   750   81%    10    0%  
   5 Clueless_ECC7R   542   28   28  1223   81%   -33    0%  
   6 Clueless_ED8R    485   29   29  1171   80%   -95    0%  
   7 Clueless_ECC6R   438   27   27  1199   74%   -26    0%  
   8 Clueless_EDC5R   348   34   34   750   66%    24    0%  
   9 Clueless_ECC5R   296   27   27  1162   65%   -16    0%  
  10 Clueless_ED7R    286   27   27  1160   68%   -82    0%  
  11 Clueless_EDC4R   171   35   35   734   54%    47    0%  
  12 Clueless_ED6R    151   28   28  1156   60%   -75    0%  
  13 Clueless_ECC4R   111   27   28  1162   53%    -4    0%  
  14 Clueless_ED5R    -77   28   29  1156   45%   -60    0%  
  15 Clueless_EDC3R   -92   37   38   733   38%    60    0%  
  16 Clueless_ECC3R  -187   31   31  1159   36%    15    0%  
  17 Clueless_ED4R   -230   30   31  1145   36%   -48    0%  
  18 Clueless_EDC2R  -387   42   44   729   24%    72    0%  
  19 Clueless_ECC2R  -400   34   35  1159   25%    28    0%  
  20 Clueless_ED3R   -440   34   34  1145   26%   -34    0%  
  21 Clueless_ED2R   -738   42   44  1129   14%   -10    0%  
  22 Clueless_EDC1R  -887   60   64   729    7%    98    0%  
  23 Clueless_ECC1R  -925   49   51  1159    7%    62    0%  
  24 Clueless_ED1R  -1220   67   80  1113    2%    14    0%  
 

IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: (no) absolute score values for pieces?
« Reply #108 on: Aug 9th, 2010, 4:50pm »
Quote Quote Modify Modify

Thanks for sharing, JDB.  This shows I don't know much about endgames.  I would have expected DR to be worth more than CC, but it isn't until the CC player is down to his last rabbit.  Also I would have expected that when dogs are still on the board, C is worth less than RR, but the C is worth more as long as both players still have at least 4 rabbits.
 
If I'm wrong about both of these things, I am probably at least correct that C is worth more than R as an initial trade, contrary to statistics from the game database suggesting otherwise.  Although who knows what results you would get from having clueless play itself head-to-head with C handicap versus R handicap?
 
A serious student of Arimaa (i.e. not me) would surely benefit from playing out some of these unbalanced endgames against a strong bot, both for general understanding of endgames, and in particular for understanding the value of material in endgames.
IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: (no) absolute score values for pieces?
« Reply #109 on: Aug 9th, 2010, 7:09pm »
Quote Quote Modify Modify

on Aug 9th, 2010, 4:50pm, Fritzlein wrote:

 
If I'm wrong about both of these things, I am probably at least correct that C is worth more than R as an initial trade, contrary to statistics from the game database suggesting otherwise.  Although who knows what results you would get from having clueless play itself head-to-head with C handicap versus R handicap?

 
I could play some games with an initial C vs r handicap, but I am not sure how good the results would be. In these lower material situations the bots are ruthless in exploiting the advantage. That is, they know how to convert the win. With so much material remaining, the bot doesn't really know how to play either side. This leaves more room for gaps in knowledge to cloud the results.
 
 
 
 
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: (no) absolute score values for pieces?
« Reply #110 on: Aug 9th, 2010, 11:58pm »
Quote Quote Modify Modify

Good point.  The results are only as significant as the player is strong, so endgames are the only realm in which computers can speak with authority.  I recall that bot random proved that an elephant is worth less than a rabbit as an initial trade.  Cheesy
IP Logged

jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: (no) absolute score values for pieces?
« Reply #111 on: Sep 1st, 2010, 3:24pm »
Quote Quote Modify Modify

Another round of testing.  
 
This set includes every material combination using only  DCR. (and always with the E)
 
The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable.
 
 
 
Code:

Rank Name    .    .   . Elo    +    - games score oppo. draws  
   1 Clueless_EDDCC8R  1168  102  102    98   88%   506    0%  
   2 Clueless_EDDCC7R  1106   91   91   111   85%   526    0%  
   3 Clueless_EDDC8R   1025   88   88   123   83%   487    0%  
   4 Clueless_EDDCC6R  1013   80   80   128   80%   527    0%  
   5 Clueless_EDCC8R    949   70   70   219   86%   318    0%  
   6 Clueless_EDCC7R    823   61   61   224   77%   351    0%  
   7 Clueless_EDDC7R    820   71   71   138   70%   509    0%  
   8 Clueless_EDDCC5R   814   75   75   125   67%   519    0%  
   9 Clueless_EDDC6R    802   71   71   148   70%   495    0%  
  10 Clueless_EDD8R     746   69   69   148   67%   490    0%  
  11 Clueless_EDCC6R    688   55   55   246   69%   363    0%  
  12 Clueless_EDC8R     679   46   46   469   79%   145    0%  
  13 Clueless_ECC8R     645   41   41   732   86%     7    0%  
  14 Clueless_EDDCC4R   608   63   63   162   61%   433    0%  
  15 Clueless_EDC7R     559   43   43   469   73%   134    0%  
  16 Clueless_EDDC5R    547   63   63   159   61%   385    0%  
  17 Clueless_EDD7R     512   64   64   158   56%   410    0%  
  18 Clueless_EDCC5R    510   53   53   249   60%   320    0%  
  19 Clueless_ECC7R     486   35   35   743   79%     5    0%  
  20 Clueless_EDDCC3R   475   62   62   169   60%   322    0%  
  21 Clueless_EDD6R     465   62   62   170   61%   321    0%  
  22 Clueless_EDDC4R    455   61   61   163   58%   339    0%  
  23 Clueless_EDC6R     431   42   42   470   66%   117    0%  
  24 Clueless_ED8R  .   384   44   44   455   73%   -44    0%  
  25 Clueless_EDCC4R    356   54   54   246   52%   277    0%  
  26 Clueless_ECC6R     350   33   33   740   71%     0    0%  
  27 Clueless_EC8R  .   328   40   40   436   66%    77    0%  
  28 Clueless_EDC5R     258   41   41   469   58%    89    0%  
  29 Clueless_EDD5R     219   59   59   177   53%   182    0%  
  30 Clueless_ED7R   .  219   42   42   471   63%   -45    0%  
  31 Clueless_ECC5R     192   31   31   762   59%     5    0%  
  32 Clueless_EDDCC2R   179   62   62   169   58%    84    0%  
  33 Clueless_EC7R   .  156   38   38   458   54%    61    0%  
  34 Clueless_EDDC3R    147   59   59   173   54%   108    0%  
  35 Clueless_EDCC3R    104   56   56   252   42%   201    0%  
  36 Clueless_ED6R   .   72   42   42   471   56%   -74    0%  
  37 Clueless_EDD4R  .   40   59   59   173   46%   104    0%  
  38 Clueless_EDC4R   .  29   43   43   488   45%    75    0%  
  39 Clueless_ECC4R   .   9   28   28   895   51%   -34    0%  
  40 Clueless_EC6R   .   -4   27   27   779   53%   -48    0%  
  41 Clueless_E8R   .   -63   24   24   957   50%   -73    0%  
  42 Clueless_EDD3R    -138   61   61   161   46%   -66    0%  
  43 Clueless_ED5R     -167   43   43   469   44%  -111    0%  
  44 Clueless_EDC3R    -172   44   44   491   36%    43    0%  
  45 Clueless_EC5R     -179   33   33   580   36%   -26    0%  
  46 Clueless_EDDC2R   -180   59   59   168   44%   -86    0%  
  47 Clueless_EDCC2R   -193   57   57   272   32%   110    0%  
  48 Clueless_ECC3R    -217   30   30   892   35%   -42    0%  
  49 Clueless_E7R    . -248   26   26   927   48%  -305    0%  
  50 Clueless_EDDCC1R  -262   62   62   157   41%  -127    0%  
  51 Clueless_ED4R     -315   45   45   454   37%  -122    0%  
  52 Clueless_EC4R     -366   46   46   394   64%  -608    0%  
  53 Clueless_E6R   .  -421   42   42   449   59%  -597    0%  
  54 Clueless_EDD2R    -439   66   66   143   34%  -210    0%  
  55 Clueless_EDC2R    -464   50   50   468   23%    24    0%  
  56 Clueless_ECC2R    -484   38   38   707   43%  -371    0%  
  57 Clueless_EDDC1R   -532   70   70   136   30%  -240    0%  
  58 Clueless_ED3R     -537   50   50   440   26%  -136    0%  
  59 Clueless_EDCC1R   -545   68   68   251   18%    65    0%  
  60 Clueless_EC3R     -573   46   46   362   56%  -702    0%  
  61 Clueless_E5R   .  -605   44   44   384   54%  -702    0%  
  62 Clueless_ED2R     -768   61   61   391   18%  -163    0%  
  63 Clueless_E4R   .  -801   48   48   367   42%  -716    0%  
  64 Clueless_EDD1R    -842   93   93   100   20%  -322    0%  
  65 Clueless_EDC1R    -859   70   70   413    9%    21    0%  
  66 Clueless_ECC1R    -963   48   48   635   21%  -388    0%  
  67 Clueless_EC2R     -986   55   55   322   34%  -745    0%  
  68 Clueless_E3R     -1021   56   56   339   32%  -734    0%  
  69 Clueless_ED1R    -1280  111  111   357    3%  -120    0%  
  70 Clueless_E2R     -1377   75   75   318   15%  -703    0%  
  71 Clueless_EC1R    -1493   84   84   299   12%  -720    0%  
  72 Clueless_E1R     -1875  141  141   304    1%  -652    0%  
 
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: (no) absolute score values for pieces?
« Reply #112 on: Sep 15th, 2010, 3:28pm »
Quote Quote Modify Modify

Quote:
The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable.

 
Hello jdb,
I find your tests very interesting to compare them with evaluator behaviors.
Indeed, some results are suspect in this last batch (EDD7R < EDC7R for ex), so I am waiting for your next tournament.
 
I am wondering if some inconsistency could be linked to a kind of non-linearity (or even intransitivity although I am aware that it is controversial).
Imagine that setup1<setup2<setup3 and setup3 has more difficulty to beat setup1 than setup2.
In that case, I feel that you should perform all the duels a great number times to get an accurate result.
 
Iwould also be interested to get the results of the duels that you have performed. It is a very good reference to verify the consistency of evaluators
IP Logged
aaaa
Forum Guru
*****



Arimaa player #958

   


Posts: 768
Re: (no) absolute score values for pieces?
« Reply #113 on: Sep 20th, 2010, 7:53am »
Quote Quote Modify Modify

jdb, would you be willing to give the pairwise outcome matrices from now on as well? Thanks.
IP Logged
jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: (no) absolute score values for pieces?
« Reply #114 on: Sep 21st, 2010, 8:43am »
Quote Quote Modify Modify

on Sep 20th, 2010, 7:53am, aaaa wrote:
jdb, would you be willing to give the pairwise outcome matrices from now on as well? Thanks.

 
Janzert kindly put the pgn file for the tournament on his website. It is compatible with bayeselo.
 
http://arimaa.janzert.com/jdb/reduced_material_result.zip
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: (no) absolute score values for pieces?
« Reply #115 on: Sep 28th, 2010, 3:02pm »
Quote Quote Modify Modify


I would like to suggest one way to use jdb's work.
 
If we assume that jdb's results are as close as possible from real results we can use them to calculate error indicators of results foreseen by an evaluator.
 
1) calculate all the results foreseen by the evaluator on the tournament. For example in DCR tournament calculate the evaluation of the 72*72 duels
2) Calculate the average evaluation of each setup
3) Get the rank estimated by the evaluator for each setup according to the average
4) Calculate error indicator assuming that jdb's results are the real observations. I suggest the following ones (I have not a clear idea of the most pertinent) :
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
 
It could be a mean to perform some preliminary tests and get a preliminary "objective" performance measurement before implementing the evaluator in a bot.
Of course it would not be perfect because it depends on jdb's results accuracy (in particular some results of DCR tournament should be improved).
IP Logged
pago
Forum Guru
*****



Arimaa player #5439

   
Email

Gender: male
Posts: 69
Re: (no) absolute score values for pieces?
« Reply #116 on: Sep 30th, 2010, 4:54am »
Quote Quote Modify Modify


Quote:
Janzert kindly put the pgn file for the tournament on his website. It is compatible with bayeselo.  
 
http://arimaa.janzert.com/jdb/reduced_material_result.zip

 
I would like to thank you for sharing your results. Personnaly I find them very interresting and useful.
 
Quote:
The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable.

 
I have found some weird results when I used your pgn file.
For example clueless got the following results :
ED8R / EDD6R : +2 /-2
ED8R / EDC6R : +0 / -12
 
The second result could let think that EDC6R >> ED8R. However the first one seems to show that ED8R and EDD6R are about equal (although it is statistically not significant).
 
Do you have some explaination to these results ?
Could it be a result of a non efficient positional parameter ? (For example ED8R would not properly against the cat).
IP Logged
jdb
Forum Guru
*****



Arimaa player #214

   


Gender: male
Posts: 682
Re: (no) absolute score values for pieces?
« Reply #117 on: Sep 30th, 2010, 8:42am »
Quote Quote Modify Modify

Any of the games involving dogs are somewhat suspect. The eval did not handle them well. If there is a cycle involving only cats and rabbits, I would say it was a reliable result.
IP Logged
Weirdo87
Forum Junior Member
**



Arimaa player #3347

   


Gender: male
Posts: 6
Re: (no) absolute score values for pieces?
« Reply #118 on: Oct 10th, 2010, 3:38am »
Quote Quote Modify Modify

on Aug 9th, 2010, 4:50pm, Fritzlein wrote:
A serious student of Arimaa (i.e. not me)

If Fritzlein isn't a serious student of Arimaa, who the hell is?
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: (no) absolute score values for pieces?
« Reply #119 on: Oct 10th, 2010, 9:06am »
Quote Quote Modify Modify

on Oct 10th, 2010, 3:38am, Weirdo87 wrote:
If Fritzlein isn't a serious student of Arimaa, who the hell is?

Chessandgo. Wink
 
Truly, I spend a lot of time on Arimaa, but not in a disciplined way.
« Last Edit: Oct 10th, 2010, 9:07am by Fritzlein » IP Logged

Pages: 1 ... 6 7 8 9  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« No topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.