Welcome, Guest. Please Login or Register.
Apr 30th, 2024, 3:51am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « 2013 Arimaa Challenge »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   2013 Arimaa Challenge
« Previous topic | Next topic »
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: 2013 Arimaa Challenge  (Read 7801 times)
tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: 2013 Arimaa Challenge
« Reply #15 on: Mar 26th, 2013, 5:50pm »
Quote Quote Modify Modify

That would be even stranger, as one bot would then get two games to get one point and the other bot only one game. As winning all games and winning half of the games is not equal.
 
The matching of the color is just a simple way to enforce pairs of games in the score, it's not the color that is the important part it's the order...
 
But I do agree that unfinished pairs do make the scoring look strange though.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2013 Arimaa Challenge
« Reply #16 on: Mar 26th, 2013, 6:44pm »
Quote Quote Modify Modify

on Mar 26th, 2013, 10:35am, Boo wrote:
What if some players end up having played only 3 screening games? I think the results are calculated in weird way. E.g. both aaaa and arimaa_master have played 3 games, 2 against ziltoid and 1 against marwin. Both won 1 game against ziltoid, and lost 2 other games. however the score is 1-1 for aaaa, and 0-1 for arimaa_master. Why does a colour of a game have such a big impact to the final result? I think the same amount of points for marwin and ziltoid should be assigned in such a case.

For maximum fairness, screening games should always be played in pairs.  The "play bots" page tells people not to play one game of a pair unless they can play both.  Nevertheless, in real life it isn't always possible for people to know whether they will have time to complete every pair, so every year there are several uncompleted pairs.
 
The only year in which this has been an issue was 2011, when marwin won by half a point but the uncompleted pairs favored sharp.  If all of those pairs had been finished, there was a good chance sharp would have won the Screening.  But what can we do about it?  Throwing away an incomplete pair is unfair, but counting an incomplete pair seems even more unfair, since the other bot didn't have the same chance.
 
http://arimaa.com/arimaa/forum/cgi/YaBB.cgi?board=events;action=display; num=1299781791;start=60
 
I expect that at some point the Challenge Screening will move away from the current open format to an invitation-only format with, say, 15 people hand-picked by Omar who each commit to play all 4 games.  The main reason for this change would be to prevent abuse by sock-puppet accounts, but a secondary reason would be increase the chances that every pair that gets started also gets completed.
« Last Edit: Mar 26th, 2013, 6:49pm by Fritzlein » IP Logged

Boo
Forum Guru
*****



Arimaa player #6466

   


Gender: male
Posts: 118
Re: 2013 Arimaa Challenge
« Reply #17 on: Mar 27th, 2013, 4:27am »
Quote Quote Modify Modify

Quote:
But what can we do about it?

1) You can count only those players who have played all 4 games.
2) You can change the bot strength evaluation method into calculating their performance instead. Something like:
Quote:
the bots are inching up from their dismal opening to currently weigh in at 2034 and 1892 respectively.

 
It is now a weird system. One game played - too little data. 2 games played - ok, enough data for strength evaluation. 3 games played - too little data again (Huh). But logically thinking, the more games are played, the more exact strength evaluation should be. It should not be that more games played (3) make strength evaluation more obscure then with less games (2).
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2013 Arimaa Challenge
« Reply #18 on: Mar 27th, 2013, 7:39am »
Quote Quote Modify Modify

on Mar 27th, 2013, 4:27am, Boo wrote:
2) You can change the bot strength evaluation method into calculating their performance instead.

The case against using performance rating rather than raw results is that performance rating relies on the gameroom ratings of the human players, and the gameroom ratings of the human players are notoriously inaccurate, and can even change significantly between when they play one bot to when they play the other.
 
Quote:
It is now a weird system. One game played - too little data. 2 games played - ok, enough data for strength evaluation. 3 games played - too little data again (Huh).

With 3 games played, the first two are still used.  But you suggested above that even two is still too few, and only players with all four games should count?
IP Logged

Boo
Forum Guru
*****



Arimaa player #6466

   


Gender: male
Posts: 118
Re: 2013 Arimaa Challenge
« Reply #19 on: Mar 27th, 2013, 9:52am »
Quote Quote Modify Modify

Quote:
The case against using performance rating rather than raw results is that performance rating relies on the gameroom ratings of the human players, and the gameroom ratings of the human players are notoriously inaccurate, and can even change significantly between when they play one bot to when they play the other.

 
Rating inaccuracies will neglect each other as the number of played games (opponents) increases.  
'can even change significantly' - yes (for one game), but again this change is chaotic and its influence decreases as the number of games played increases. I think performance is the best way to compare bots, as it counts in every game, however both bots should play the amount of games as close as possible. (Now all new players start with ziltoid, if it is idle.)
 
Quote:
But you suggested above that even two is still too few, and only players with all four games should count?

 
Yes, I suggested that as an alternative.
 
Quote:
With 3 games played, the first two are still used.

 
Yes, and but the 3rd game is not used at all. Though as I understand from "Games where only one of the bots were played are not counted. ", the 3rd game should be counted in as both bots were played.
 
EDIT - there is a 3rd alternative - compare total win% of each bot. E.g. now ziltoid has 8/21 = 38.1% and marwin has 5/13 = 38.5%.
« Last Edit: Mar 27th, 2013, 10:01am by Boo » IP Logged

tize
Forum Guru
*****



Arimaa player #3121

   


Gender: male
Posts: 118
Re: 2013 Arimaa Challenge
« Reply #20 on: Mar 27th, 2013, 1:57pm »
Quote Quote Modify Modify

That will compare a win against a weak with a win against a strong player without trying to account for the difference in thoose wins.
 
By only counting game pairs it's ok to just count wins (or win %). But if all games should be counted then a more advanced system must be used, like a normal rating.
IP Logged
browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: 2013 Arimaa Challenge
« Reply #21 on: Mar 27th, 2013, 3:01pm »
Quote Quote Modify Modify

on Mar 27th, 2013, 1:57pm, tize wrote:
That will compare a win against a weak with a win against a strong player without trying to account for the difference in thoose wins.
 
By only counting game pairs it's ok to just count wins (or win %). But if all games should be counted then a more advanced system must be used, like a normal rating.

I'd be wary of using ratings in the Screening. Improving players and bot-bashers will have very inaccurate ratings. Is it fair to consider ratings when a rapidly improving player's rating of 1830 is not reflective of his current strength and he plays in the screening? Or when a 2300 bot-basher whose true strength is closer to 1800 plays?
IP Logged

Boo
Forum Guru
*****



Arimaa player #6466

   


Gender: male
Posts: 118
Re: 2013 Arimaa Challenge
« Reply #22 on: Mar 27th, 2013, 3:34pm »
Quote Quote Modify Modify

Yes, using ratings has the luck factor involved. But I think it is of much lesser impact, than it is now, when a bot gets a point in a 3 game series essentially based on a coin flip. The current result is 2-5 for marwin, and 2 points out of 5 for marwin are won on a coin flip. The result could easily be 2-3, if ziltoid had guessed the winning colour.  Isn't it too much luck? And how many players who lose/win 300pts in a month are in the screening?
IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: 2013 Arimaa Challenge
« Reply #23 on: Mar 27th, 2013, 5:40pm »
Quote Quote Modify Modify

on Mar 27th, 2013, 3:34pm, Boo wrote:
Yes, using ratings has the luck factor involved. But I think it is of much lesser impact, than it is now, when a bot gets a point in a 3 game series essentially based on a coin flip. The current result is 2-5 for marwin, and 2 points out of 5 for marwin are won on a coin flip. The result could easily be 2-3, if ziltoid had guessed the winning colour.  Isn't it too much luck? And how many players who lose/win 300pts in a month are in the screening?

I don't understand what you mean, Boo. How were any games "won on a coin flip?"
IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: 2013 Arimaa Challenge
« Reply #24 on: Mar 28th, 2013, 2:09am »
Quote Quote Modify Modify

Poor marwin has lost its last 6 of 7 games, bringing its gameroom rating down to 2133 and below ziltoid's. It seems ziltoid has been faring much better in the recent match-ups, losing only 2 of 7. Of course I chose the number 7 in an unfair way, but anyway it is still looking like a close race!
It's funny how novacat has now won both of his two games by elimination, and he's the only one to win a screening game by elimination so far. Maybe it has something to do with his style?
IP Logged

Boo
Forum Guru
*****



Arimaa player #6466

   


Gender: male
Posts: 118
Re: 2013 Arimaa Challenge
« Reply #25 on: Mar 28th, 2013, 2:31am »
Quote Quote Modify Modify

Quote:
I don't understand what you mean, Boo. How were any games "won on a coin flip?"

 
I talk about 3 game serie, not a single game.
E.g. against arimaa_master (The same applies to RmznA). Now ziltoid has won with silver and lost with gold and thus marwin leads, because it won with gold. If ziltoid had won with gold and lost with silver, it would be 1-1 as opposed to the current 0-1. What is the difference between those two scenarios?
IP Logged

novacat
Forum Guru
*****



Arimaa player #751

   


Gender: male
Posts: 119
Re: 2013 Arimaa Challenge
« Reply #26 on: Mar 28th, 2013, 7:27am »
Quote Quote Modify Modify

on Mar 28th, 2013, 2:31am, Boo wrote:
E.g. against arimaa_master (The same applies to RmznA). Now ziltoid has won with silver and lost with gold and thus marwin leads, because it won with gold. If ziltoid had won with gold and lost with silver, it would be 1-1 as opposed to the current 0-1. What is the difference between those two scenarios?

The difference is that when bot_ziltoid played silver, it was the first game ever between the player and the bot on the current hardware.  The game with bot_ziltoid as gold was the second encounter.  Bot_ziltoid may have learned from its first game.  We would certainly assume that for the human if the results were reversed.  
 
Also, the human player may have decided to try a more risky strategy the second time around since they already beat the bot the first time.  It is not unprecedented for someone to play the bots in less than ideal conditions just for fun, and these people are typically conscientious enough to play the two bots in the same manner.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2013 Arimaa Challenge
« Reply #27 on: Mar 28th, 2013, 9:58am »
Quote Quote Modify Modify

Since my last update, ziltoid went 2-1 while marwin went 1-4.  This includes ziltoid's first point of the screening to pull within 1.5 of marwin.  Ziltoid also pulls closer in performance rating, 1908 to 1993.  It would be quite an unexpected coup for humanity to beat down both bots to have a sub-2000 performance rating by the end of the screening.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2013 Arimaa Challenge
« Reply #28 on: Mar 30th, 2013, 7:25pm »
Quote Quote Modify Modify

A late flurry of activity brings the number of humans completing the full screening to eight: Thiagor, Max, aaaa, arimaa_master, Fritzlein, gthreepwood, RmznA, and supersamu.  The total number of games played probably won't match that of last year, but it is great to see so many people finishing what they start.  Indeed, there are presently only two incomplete pairs: 722caasi and mightyfez.
 
After supersamu completed his sweep, the bots went on a five-game winning streak, pushing their ratings up to 2069 and 1978 respectively.  Ziltoid picked up a point, but marwin got it right back, to keep its lead at 1.5 points with just under 24 hours remaining.  Things are looking bleak for ziltoid unless 722caasi and mightyfez each finish their pairs by beating marwin.
« Last Edit: Mar 30th, 2013, 7:26pm by Fritzlein » IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: 2013 Arimaa Challenge
« Reply #29 on: Mar 30th, 2013, 11:58pm »
Quote Quote Modify Modify

on Mar 30th, 2013, 7:25pm, Fritzlein wrote:
Things are looking bleak for ziltoid unless 722caasi and mightyfez each finish their pairs by beating marwin.

That's too bad, I have a formula for ziltoid which I think is pretty much infallible half the time.  Even a beginner could do it, although some of the experts might have difficulty.
I haven't tested it at 2 minutes/move, and I don't plan to.
Can you figure it out without looking at my game history? (or the chat archive)
Consider it a riddle! You've got three clues Smiley
« Last Edit: Mar 31st, 2013, 12:07am by browni3141 » IP Logged

Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.