Author |
Topic: (no) absolute score values for pieces? (Read 40255 times) |
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #105 on: Aug 2nd, 2010, 7:28pm » |
Quote Modify
|
Very interesting, JDB. Thanks for sharing. I recall that I once proposed that two rabbits would always be worth more than a cat. Aaaa suggested EC6R vs E8R as a possible counter-example, and it seem that he was correct. It doesn't flip to the two rabbits being more valuable until three more rabbits have been exchanged.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #106 on: Aug 5th, 2010, 7:20pm » |
Quote Modify
|
I added dogs to the handicap matches. It will take a couple weeks to get enough games to cover all the cases.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #107 on: Aug 9th, 2010, 3:40pm » |
Quote Modify
|
Another round of testing. This time DC vs D vs CC. The relative value of each pair depends greatly on the number of rabbits remaining. Code: Rank Name . . Elo + - games score oppo. draws 1 Clueless_EDC8R 794 44 41 774 91% 9 0% 2 Clueless_ECC8R 730 34 32 1227 91% -45 0% 3 Clueless_EDC7R 635 38 36 758 83% 14 0% 4 Clueless_EDC6R 597 37 36 750 81% 10 0% 5 Clueless_ECC7R 542 28 28 1223 81% -33 0% 6 Clueless_ED8R 485 29 29 1171 80% -95 0% 7 Clueless_ECC6R 438 27 27 1199 74% -26 0% 8 Clueless_EDC5R 348 34 34 750 66% 24 0% 9 Clueless_ECC5R 296 27 27 1162 65% -16 0% 10 Clueless_ED7R 286 27 27 1160 68% -82 0% 11 Clueless_EDC4R 171 35 35 734 54% 47 0% 12 Clueless_ED6R 151 28 28 1156 60% -75 0% 13 Clueless_ECC4R 111 27 28 1162 53% -4 0% 14 Clueless_ED5R -77 28 29 1156 45% -60 0% 15 Clueless_EDC3R -92 37 38 733 38% 60 0% 16 Clueless_ECC3R -187 31 31 1159 36% 15 0% 17 Clueless_ED4R -230 30 31 1145 36% -48 0% 18 Clueless_EDC2R -387 42 44 729 24% 72 0% 19 Clueless_ECC2R -400 34 35 1159 25% 28 0% 20 Clueless_ED3R -440 34 34 1145 26% -34 0% 21 Clueless_ED2R -738 42 44 1129 14% -10 0% 22 Clueless_EDC1R -887 60 64 729 7% 98 0% 23 Clueless_ECC1R -925 49 51 1159 7% 62 0% 24 Clueless_ED1R -1220 67 80 1113 2% 14 0% |
|
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #108 on: Aug 9th, 2010, 4:50pm » |
Quote Modify
|
Thanks for sharing, JDB. This shows I don't know much about endgames. I would have expected DR to be worth more than CC, but it isn't until the CC player is down to his last rabbit. Also I would have expected that when dogs are still on the board, C is worth less than RR, but the C is worth more as long as both players still have at least 4 rabbits. If I'm wrong about both of these things, I am probably at least correct that C is worth more than R as an initial trade, contrary to statistics from the game database suggesting otherwise. Although who knows what results you would get from having clueless play itself head-to-head with C handicap versus R handicap? A serious student of Arimaa (i.e. not me) would surely benefit from playing out some of these unbalanced endgames against a strong bot, both for general understanding of endgames, and in particular for understanding the value of material in endgames.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #109 on: Aug 9th, 2010, 7:09pm » |
Quote Modify
|
on Aug 9th, 2010, 4:50pm, Fritzlein wrote: If I'm wrong about both of these things, I am probably at least correct that C is worth more than R as an initial trade, contrary to statistics from the game database suggesting otherwise. Although who knows what results you would get from having clueless play itself head-to-head with C handicap versus R handicap? |
| I could play some games with an initial C vs r handicap, but I am not sure how good the results would be. In these lower material situations the bots are ruthless in exploiting the advantage. That is, they know how to convert the win. With so much material remaining, the bot doesn't really know how to play either side. This leaves more room for gaps in knowledge to cloud the results.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #110 on: Aug 9th, 2010, 11:58pm » |
Quote Modify
|
Good point. The results are only as significant as the player is strong, so endgames are the only realm in which computers can speak with authority. I recall that bot random proved that an elephant is worth less than a rabbit as an initial trade.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #111 on: Sep 1st, 2010, 3:24pm » |
Quote Modify
|
Another round of testing. This set includes every material combination using only DCR. (and always with the E) The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable. Code: Rank Name . . . Elo + - games score oppo. draws 1 Clueless_EDDCC8R 1168 102 102 98 88% 506 0% 2 Clueless_EDDCC7R 1106 91 91 111 85% 526 0% 3 Clueless_EDDC8R 1025 88 88 123 83% 487 0% 4 Clueless_EDDCC6R 1013 80 80 128 80% 527 0% 5 Clueless_EDCC8R 949 70 70 219 86% 318 0% 6 Clueless_EDCC7R 823 61 61 224 77% 351 0% 7 Clueless_EDDC7R 820 71 71 138 70% 509 0% 8 Clueless_EDDCC5R 814 75 75 125 67% 519 0% 9 Clueless_EDDC6R 802 71 71 148 70% 495 0% 10 Clueless_EDD8R 746 69 69 148 67% 490 0% 11 Clueless_EDCC6R 688 55 55 246 69% 363 0% 12 Clueless_EDC8R 679 46 46 469 79% 145 0% 13 Clueless_ECC8R 645 41 41 732 86% 7 0% 14 Clueless_EDDCC4R 608 63 63 162 61% 433 0% 15 Clueless_EDC7R 559 43 43 469 73% 134 0% 16 Clueless_EDDC5R 547 63 63 159 61% 385 0% 17 Clueless_EDD7R 512 64 64 158 56% 410 0% 18 Clueless_EDCC5R 510 53 53 249 60% 320 0% 19 Clueless_ECC7R 486 35 35 743 79% 5 0% 20 Clueless_EDDCC3R 475 62 62 169 60% 322 0% 21 Clueless_EDD6R 465 62 62 170 61% 321 0% 22 Clueless_EDDC4R 455 61 61 163 58% 339 0% 23 Clueless_EDC6R 431 42 42 470 66% 117 0% 24 Clueless_ED8R . 384 44 44 455 73% -44 0% 25 Clueless_EDCC4R 356 54 54 246 52% 277 0% 26 Clueless_ECC6R 350 33 33 740 71% 0 0% 27 Clueless_EC8R . 328 40 40 436 66% 77 0% 28 Clueless_EDC5R 258 41 41 469 58% 89 0% 29 Clueless_EDD5R 219 59 59 177 53% 182 0% 30 Clueless_ED7R . 219 42 42 471 63% -45 0% 31 Clueless_ECC5R 192 31 31 762 59% 5 0% 32 Clueless_EDDCC2R 179 62 62 169 58% 84 0% 33 Clueless_EC7R . 156 38 38 458 54% 61 0% 34 Clueless_EDDC3R 147 59 59 173 54% 108 0% 35 Clueless_EDCC3R 104 56 56 252 42% 201 0% 36 Clueless_ED6R . 72 42 42 471 56% -74 0% 37 Clueless_EDD4R . 40 59 59 173 46% 104 0% 38 Clueless_EDC4R . 29 43 43 488 45% 75 0% 39 Clueless_ECC4R . 9 28 28 895 51% -34 0% 40 Clueless_EC6R . -4 27 27 779 53% -48 0% 41 Clueless_E8R . -63 24 24 957 50% -73 0% 42 Clueless_EDD3R -138 61 61 161 46% -66 0% 43 Clueless_ED5R -167 43 43 469 44% -111 0% 44 Clueless_EDC3R -172 44 44 491 36% 43 0% 45 Clueless_EC5R -179 33 33 580 36% -26 0% 46 Clueless_EDDC2R -180 59 59 168 44% -86 0% 47 Clueless_EDCC2R -193 57 57 272 32% 110 0% 48 Clueless_ECC3R -217 30 30 892 35% -42 0% 49 Clueless_E7R . -248 26 26 927 48% -305 0% 50 Clueless_EDDCC1R -262 62 62 157 41% -127 0% 51 Clueless_ED4R -315 45 45 454 37% -122 0% 52 Clueless_EC4R -366 46 46 394 64% -608 0% 53 Clueless_E6R . -421 42 42 449 59% -597 0% 54 Clueless_EDD2R -439 66 66 143 34% -210 0% 55 Clueless_EDC2R -464 50 50 468 23% 24 0% 56 Clueless_ECC2R -484 38 38 707 43% -371 0% 57 Clueless_EDDC1R -532 70 70 136 30% -240 0% 58 Clueless_ED3R -537 50 50 440 26% -136 0% 59 Clueless_EDCC1R -545 68 68 251 18% 65 0% 60 Clueless_EC3R -573 46 46 362 56% -702 0% 61 Clueless_E5R . -605 44 44 384 54% -702 0% 62 Clueless_ED2R -768 61 61 391 18% -163 0% 63 Clueless_E4R . -801 48 48 367 42% -716 0% 64 Clueless_EDD1R -842 93 93 100 20% -322 0% 65 Clueless_EDC1R -859 70 70 413 9% 21 0% 66 Clueless_ECC1R -963 48 48 635 21% -388 0% 67 Clueless_EC2R -986 55 55 322 34% -745 0% 68 Clueless_E3R -1021 56 56 339 32% -734 0% 69 Clueless_ED1R -1280 111 111 357 3% -120 0% 70 Clueless_E2R -1377 75 75 318 15% -703 0% 71 Clueless_EC1R -1493 84 84 299 12% -720 0% 72 Clueless_E1R -1875 141 141 304 1% -652 0% |
|
|
|
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: (no) absolute score values for pieces?
« Reply #112 on: Sep 15th, 2010, 3:28pm » |
Quote Modify
|
Quote:The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable. |
| Hello jdb, I find your tests very interesting to compare them with evaluator behaviors. Indeed, some results are suspect in this last batch (EDD7R < EDC7R for ex), so I am waiting for your next tournament. I am wondering if some inconsistency could be linked to a kind of non-linearity (or even intransitivity although I am aware that it is controversial). Imagine that setup1<setup2<setup3 and setup3 has more difficulty to beat setup1 than setup2. In that case, I feel that you should perform all the duels a great number times to get an accurate result. Iwould also be interested to get the results of the duels that you have performed. It is a very good reference to verify the consistency of evaluators
|
|
IP Logged |
|
|
|
aaaa
Forum Guru
Arimaa player #958
Posts: 768
|
|
Re: (no) absolute score values for pieces?
« Reply #113 on: Sep 20th, 2010, 7:53am » |
Quote Modify
|
jdb, would you be willing to give the pairwise outcome matrices from now on as well? Thanks.
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #114 on: Sep 21st, 2010, 8:43am » |
Quote Modify
|
on Sep 20th, 2010, 7:53am, aaaa wrote:jdb, would you be willing to give the pairwise outcome matrices from now on as well? Thanks. |
| Janzert kindly put the pgn file for the tournament on his website. It is compatible with bayeselo. http://arimaa.janzert.com/jdb/reduced_material_result.zip
|
|
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: (no) absolute score values for pieces?
« Reply #115 on: Sep 28th, 2010, 3:02pm » |
Quote Modify
|
I would like to suggest one way to use jdb's work. If we assume that jdb's results are as close as possible from real results we can use them to calculate error indicators of results foreseen by an evaluator. 1) calculate all the results foreseen by the evaluator on the tournament. For example in DCR tournament calculate the evaluation of the 72*72 duels 2) Calculate the average evaluation of each setup 3) Get the rank estimated by the evaluator for each setup according to the average 4) Calculate error indicator assuming that jdb's results are the real observations. I suggest the following ones (I have not a clear idea of the most pertinent) : - Root Mean Square Error (RMSE) - Mean Absolute Error (MAE) - Mean Absolute Percentage Error (MAPE) It could be a mean to perform some preliminary tests and get a preliminary "objective" performance measurement before implementing the evaluator in a bot. Of course it would not be perfect because it depends on jdb's results accuracy (in particular some results of DCR tournament should be improved).
|
|
IP Logged |
|
|
|
pago
Forum Guru
Arimaa player #5439
Gender:
Posts: 69
|
|
Re: (no) absolute score values for pieces?
« Reply #116 on: Sep 30th, 2010, 4:54am » |
Quote Modify
|
Quote: I would like to thank you for sharing your results. Personnaly I find them very interresting and useful. Quote:The results towards the top of the table are probably suspect. I'll rerun the tournament from the beginning when I'm done tuning with bot_nomhh. Towards the bottom of the table, the results should be a lot more reliable. |
| I have found some weird results when I used your pgn file. For example clueless got the following results : ED8R / EDD6R : +2 /-2 ED8R / EDC6R : +0 / -12 The second result could let think that EDC6R >> ED8R. However the first one seems to show that ED8R and EDD6R are about equal (although it is statistically not significant). Do you have some explaination to these results ? Could it be a result of a non efficient positional parameter ? (For example ED8R would not properly against the cat).
|
|
IP Logged |
|
|
|
jdb
Forum Guru
Arimaa player #214
Gender:
Posts: 682
|
|
Re: (no) absolute score values for pieces?
« Reply #117 on: Sep 30th, 2010, 8:42am » |
Quote Modify
|
Any of the games involving dogs are somewhat suspect. The eval did not handle them well. If there is a cycle involving only cats and rabbits, I would say it was a reliable result.
|
|
IP Logged |
|
|
|
Weirdo87
Forum Junior Member
Arimaa player #3347
Gender:
Posts: 6
|
|
Re: (no) absolute score values for pieces?
« Reply #118 on: Oct 10th, 2010, 3:38am » |
Quote Modify
|
on Aug 9th, 2010, 4:50pm, Fritzlein wrote:A serious student of Arimaa (i.e. not me) |
| If Fritzlein isn't a serious student of Arimaa, who the hell is?
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: (no) absolute score values for pieces?
« Reply #119 on: Oct 10th, 2010, 9:06am » |
Quote Modify
|
on Oct 10th, 2010, 3:38am, Weirdo87 wrote:If Fritzlein isn't a serious student of Arimaa, who the hell is? |
| Chessandgo. Truly, I spend a lot of time on Arimaa, but not in a disciplined way.
|
« Last Edit: Oct 10th, 2010, 9:07am by Fritzlein » |
IP Logged |
|
|
|
|