Welcome, Guest. Please Login or Register.
Apr 23rd, 2024, 4:04pm

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Challenge Screening Rules »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   Challenge Screening Rules
« Previous topic | Next topic »
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Challenge Screening Rules  (Read 6270 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Challenge Screening Rules
« Reply #15 on: Mar 19th, 2015, 10:37am »
Quote Quote Modify Modify

Janzert, what does bayesElo say about the likelihood of superiority of the computer champion based just on the Computer Championship games in 2009-2014?
IP Logged

Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Challenge Screening Rules
« Reply #16 on: Mar 20th, 2015, 8:05pm »
Quote Quote Modify Modify

Ok, here's the WCC LoS between first and second each year:
 
Code:
2009: 83% bot_clueless > bot_Gnobot
2010: 67% bot_marwin > bot_clueless
2011: 58% bot_sharp > bot_marwin
2012: 60% bot_marwin > bot_briareus
2013: 69% bot_ziltoid > bot_marwin
2014: 77% bot_sharp > bot_ziltoid
2015: 97% bot_sharp > bot_Z

 
So WCC is barely any better at getting confident between the top 2 spots. The difference to me is the WCC is straightforward to change to get better differentiation if so desired.
 
Also for grins I ran a combined LoS using both the Screening and WCC games.
 
Code:
2009: 90% bot_clueless > bot_Gnobot
2010: 73% bot_marwin > bot_clueless
2011: 53% bot_marwin > bot_sharp
2012: 66% bot_briareus > bot_marwin
2013: 50% bot_ziltoid > bot_marwin
2014: 57% bot_sharp > bot_ziltoid

 
Janzert
 
Edit: And since I now have the scripts to do it fairly easily for any event in the database here's the same for past WCs as well:
 
Code:
2009: 66% chessandgo > Fritzlein
2010: 67% chessandgo > Fritzlein
2011: 83% chessandgo > Adanac
2012: 73% hanzack > chessandgo
2013: 79% chessandgo > Boo
2014: 94% chessandgo > browni3141
« Last Edit: Mar 20th, 2015, 9:38pm by Janzert » IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Challenge Screening Rules
« Reply #17 on: Mar 21st, 2015, 2:42pm »
Quote Quote Modify Modify

Here's probably the situation that makes me think that I wouldn't really like the screening almost no matter how it is setup. Consider if bot_a wins the WCC, bot_a and bot_b go to the screening, bot_b wins the screening and then wins the challenge.
 
As either bot's author I wouldn't be terribly happy (less so as bot_a's though Wink). And I think would open up quite a bit of dissatisfaction all around.
 
Janzert
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Challenge Screening Rules
« Reply #18 on: Mar 21st, 2015, 7:11pm »
Quote Quote Modify Modify

on Mar 20th, 2015, 8:05pm, Janzert wrote:
Ok, here's the WCC LoS between first and second each year:
 
Code:
2009: 83% bot_clueless > bot_Gnobot
2010: 67% bot_marwin > bot_clueless
2011: 58% bot_sharp > bot_marwin
2012: 60% bot_marwin > bot_briareus
2013: 69% bot_ziltoid > bot_marwin
2014: 77% bot_sharp > bot_ziltoid
2015: 97% bot_sharp > bot_Z

 
So WCC is barely any better at getting confident between the top 2 spots. The difference to me is the WCC is straightforward to change to get better differentiation if so desired.
 
Also for grins I ran a combined LoS using both the Screening and WCC games.
 
Code:
2009: 90% bot_clueless > bot_Gnobot
2010: 73% bot_marwin > bot_clueless
2011: 53% bot_marwin > bot_sharp
2012: 66% bot_briareus > bot_marwin
2013: 50% bot_ziltoid > bot_marwin
2014: 57% bot_sharp > bot_ziltoid

 
Janzert
 
Edit: And since I now have the scripts to do it fairly easily for any event in the database here's the same for past WCs as well:
 
Code:
2009: 66% chessandgo > Fritzlein
2010: 67% chessandgo > Fritzlein
2011: 83% chessandgo > Adanac
2012: 73% hanzack > chessandgo
2013: 79% chessandgo > Boo
2014: 94% chessandgo > browni3141

 
This is why I wish the WCC could have many more games in it. 2015 was the only recent year when we can say with any certainty that the best bot won the WCC. The screening seems like a great idea in theory, in practice not so much. Frankly this is part of the reason why I find it hard to justify pouring a lot of time into bot development. It's disheartening to work a lot only to see the bot play just a few important games per year.
 
To put it in a more harsh way, the current format of the tournaments does not respect the hard work developers put into Arimaa.
 
If the servers / infrastructure were reliable enough, I would be asking for a WCC with many more lives per bot, but unfortunately this does not seem possible since there are way too many games with server / lag / zombie process problems. While there's a reasonably easy way to solve this last problem (a script to kill processes / reboot the servers after each game), the first two seem harder to tackle without significant work from Omar.
« Last Edit: Mar 21st, 2015, 7:17pm by rbarreira » IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Challenge Screening Rules
« Reply #19 on: Mar 21st, 2015, 11:20pm »
Quote Quote Modify Modify

If I could wish for anything, I would love it if the CC were 1-2 weeks of solid, 24 hour a day, back to back games. Maybe with increasing TCs like the WC has to get even more games. Then 2-3 weeks of human exposure, for all the CC bots on the CC hardware. With the challenge being played against the CC winner following that.
 
The CC winner could still have the defenders banned from playing it. I was also initially going to propose a maximum game cap. But given 3 weeks time, a maximum of 2 concurrent games, and an average of 2 hour games, it's actually impossible to get more than 504 games. About 12 times the number of games for the most popular screening year so far in 2011 with 85 games total. 45 for sharp and 40 for marwin. So in practice the increase in human exposure would almost certainly be well within an order of magnitude.
 
Janzert
« Last Edit: Mar 21st, 2015, 11:20pm by Janzert » IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Challenge Screening Rules
« Reply #20 on: Mar 22nd, 2015, 4:47am »
Quote Quote Modify Modify

on Mar 21st, 2015, 11:20pm, Janzert wrote:
If I could wish for anything, I would love it if the CC were 1-2 weeks of solid, 24 hour a day, back to back games. Maybe with increasing TCs like the WC has to get even more games. Then 2-3 weeks of human exposure, for all the CC bots on the CC hardware. With the challenge being played against the CC winner following that.
 
The CC winner could still have the defenders banned from playing it. I was also initially going to propose a maximum game cap. But given 3 weeks time, a maximum of 2 concurrent games, and an average of 2 hour games, it's actually impossible to get more than 504 games. About 12 times the number of games for the most popular screening year so far in 2011 with 85 games total. 45 for sharp and 40 for marwin. So in practice the increase in human exposure would almost certainly be well within an order of magnitude.
 
Janzert

 
I agree 100% with this.
IP Logged
deep_blue
Forum Guru
*****



Arimaa player #9854

   


Posts: 212
Re: Challenge Screening Rules
« Reply #21 on: Mar 22nd, 2015, 8:39am »
Quote Quote Modify Modify

I completely agree with rbarreira that the WCC should have many more games. Also I can understand very well if a programmer is more motivated when there is a longer WCC.
But then I don't see why we should stop the Screening. What if instead the WCC was a, say, decem-elimiation (or however this is called; 10 losses means you are out) and the winner gets his lives that are left as bonus to the Screening points (so you are rewarded for a clear WCC win).
IP Logged
lightvector
Forum Guru
*****



Arimaa player #2543

   


Gender: male
Posts: 197
Re: Challenge Screening Rules
« Reply #22 on: Mar 22nd, 2015, 9:20am »
Quote Quote Modify Modify

on Mar 21st, 2015, 11:20pm, Janzert wrote:
If I could wish for anything, I would love it if the CC were 1-2 weeks of solid, 24 hour a day, back to back games. Maybe with increasing TCs like the WC has to get even more games. Then 2-3 weeks of human exposure, for all the CC bots on the CC hardware. With the challenge being played against the CC winner following that.
 
The CC winner could still have the defenders banned from playing it. I was also initially going to propose a maximum game cap. But given 3 weeks time, a maximum of 2 concurrent games, and an average of 2 hour games, it's actually impossible to get more than 504 games. About 12 times the number of games for the most popular screening year so far in 2011 with 85 games total. 45 for sharp and 40 for marwin. So in practice the increase in human exposure would almost certainly be well within an order of magnitude.
 
Janzert

on Mar 22nd, 2015, 4:47am, rbarreira wrote:

I agree 100% with this.

 
I support this for future years as well.
 
If people think that the screening should still be used to discriminate between and select one of the bots, I think it's important that a format is chosen that encourages many, many more games than currently. Which I think means, among other things, offering faster time controls for the screening games (e.g. - the user has a choice between 30s/move, 60s/move, and 2m/move for each game pair, and the limit on pairs is more than 2).
 
 
IP Logged
deep_blue
Forum Guru
*****



Arimaa player #9854

   


Posts: 212
Re: Challenge Screening Rules
« Reply #23 on: Mar 22nd, 2015, 10:05am »
Quote Quote Modify Modify

Well, if even the bot programmer with the best chances for the Challenge win agrees to my suggestion to play more games then I think that should be done. Also many Screening games could be a good oportunity to fix a bot's weaknesses?!
And it would be an increase of serious games like rbarreira wanted it. The only question is if there would be so many more games but that's of course no reason to not try out. But in case of shorter time controls I think those should count less (which also would be some incentive to play the slower time controls when one has the time to do(since that is the Challenge time control...))
IP Logged
supersamu
Forum Moderator
Forum Guru
*****



Arimaa player #7523

   


Gender: male
Posts: 140
Re: Challenge Screening Rules
« Reply #24 on: Mar 22nd, 2015, 11:00am »
Quote Quote Modify Modify

It would also be possible that either the final two or the WCC winner may decide which screening format to use.
In the case that the two top developers can't agree on one of the formats, the default format (the one we use right now is used).
Then there is no danger of Omar being accused of moving the goalposts.
It could also be an additional bonus for the WCC winner to be able to choose the screening format.
I also would like to see a WCC with more games.
IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: Challenge Screening Rules
« Reply #25 on: Mar 22nd, 2015, 7:45pm »
Quote Quote Modify Modify

on Mar 22nd, 2015, 11:00am, supersamu wrote:
It would also be possible that either the final two or the WCC winner may decide which screening format to use.
In the case that the two top developers can't agree on one of the formats, the default format (the one we use right now is used).
Then there is no danger of Omar being accused of moving the goalposts.
It could also be an additional bonus for the WCC winner to be able to choose the screening format.
I also would like to see a WCC with more games.

 
I think this creates a conflict of interest.
IP Logged

Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Challenge Screening Rules
« Reply #26 on: Mar 23rd, 2015, 2:04am »
Quote Quote Modify Modify

I finally had the probably obvious idea to take a look at how the various bots have fared in the gameroom since being added to the server each year. Using only rated HvB games here is what it looks like (number of games played in parentheses):
Code:
Rated HvB games against CC server bot:
2009: ? bot_clueless (125) ? bot_gnobot (2)
2010: ? bot_marwin (303) ? bot_clueless (0)
2011: 56% bot_marwin (75) > bot_sharp (78)
2012: 58% bot_marwin (35) > bot_briareus (60)
2013: 98% bot_ziltoid (51) > bot_marwin (20)
2014: 52% bot_ziltoid (11) > bot_sharp (14)

 
Not surprisingly there isn't much appetite to play the bots at the CC time control. This might also suggest that simply increasing the screening period would do little good?
 
Next I decided to take all the games played by all the various timecontrol versions available on the server. This of course is the dirtiest data yet, but does provide a large number of games. To expand on that thought a bit, the WCC of course provides the best structure for determining strength in that it's a fully structured tournament including head to head games. The screening allows a self selecting population of opponents that is allowed to stop early. The gameroom has self selecting opponents that also fully select the number of games to play over a long period of time where their strength is likely to vary significantly and for which bayeselo does nothing to account for.
 
Anyway here is a combined table, also listing the number of games played by each bot.
 
Code:
Year  Screening   .   .   .   .   .   | WCC   .   .   .   .   .   .   . | Gameroom
2009: 79% clueless (27) > gnobot (26) | 83% clueless (8) > gnobot (8)   | 99% gnobot (864) > clueless (1642)
2010: 65% marwin (30) > clueless (25) | 67% marwin (10) > clueless (9)  | 100% marwin (3545) > clueless (338)
2011: 59% marwin (40) > sharp (45)   .| 58% sharp (9) > marwin (8)   .  | 99% sharp (3007) > marwin (1291)
2012: 75% briareus (41) > marwin (39) | 60% marwin (10) > briareus (11) | 93% briareus (1040) > marwin (901)
2013: 65% marwin (25) > ziltoid (28)  | 69% ziltoid (10) > marwin (9)   | 95% ziltoid (957) > marwin (789)
2014: 60% ziltoid (36) > sharp (35)   | 77% sharp (9) > ziltoid (10)  . | 99% sharp (710) > ziltoid (280)

 
Janzert
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Challenge Screening Rules
« Reply #27 on: Mar 23rd, 2015, 2:47am »
Quote Quote Modify Modify

Now to back up to a higher level. First let me note in this and most of my posts when talking about the screening I mean primarily the process of taking the top two WCC finishers and playing them against humans to then determine which of those two plays in the Challenge.  I am not arguing against or really even talking about exposing the eventual challenger to human opponents before the actual challenge.
 
Why do we have a screening? It seems the purpose is to avoid an 'anti-bot' bot from winning the WCC thus depriving a more deserving challenger of the chance. So it should relieve the bot developers from worrying about that scenario and allow them to simply concentrate on making the bot play the best possible to give the best chance at winning the challenge. The screening, if working as intended, is supposed to make the challenge harder for humanity to defend. In that case the bot developers should be the primary proponents for the screening. Yet that seems to be pretty clearly not the case. Although it's a fairly small number that have chimed in here, we seem to be the most interested in getting rid of it completely. If the group that the screening is suppose to benefit is opposed to it what are the benefits of having it?
 
Putting aside the ability to get a screening in place that does a good job of picking the stronger anti-human bot. To me the screening feels like double jeopardy for the WCC winner. It seems if the bot was able to win the WCC it should automatically get the privilege of facing the Challenge.
 
Janzert
IP Logged
rbarreira
Forum Guru
*****



Arimaa player #1621

   


Gender: male
Posts: 605
Re: Challenge Screening Rules
« Reply #28 on: Mar 23rd, 2015, 5:36pm »
Quote Quote Modify Modify

on Mar 23rd, 2015, 2:47am, Janzert wrote:

Why do we have a screening? It seems the purpose is to avoid an 'anti-bot' bot from winning the WCC thus depriving a more deserving challenger of the chance.

 
Besides all the good points you made, allow me to pile on here. Even this supposed benefit of the screening is very tenuous, when you consider that it doesn't work anymore as soon as there are two or more "anti-bot bots" participating in the WCC.
 
But the more important point is that "perfect is the enemy of good" - even if we accept that finding the best "anti-human" bot is an important thing to focus on, as you said there's little prospect of the screening actually doing that (due to lacking enough games from strong enough humans). Furthermore, the idea of allowing shorter time controls to increase the number of screening games might solve this, but then again it might make it worse as well -- humans have a harder time beating top bots at shorter TCs, making the games less informative if there are few human wins.
 
To sum up... given all the data we have I agree that the best practical chance of finding the best bot for the challenge would come from having more WCC games.
« Last Edit: Mar 23rd, 2015, 6:05pm by rbarreira » IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: Challenge Screening Rules
« Reply #29 on: Mar 24th, 2015, 1:28am »
Quote Quote Modify Modify

on Mar 21st, 2015, 11:20pm, Janzert wrote:
The CC winner could still have the defenders banned from playing it. I was also initially going to propose a maximum game cap.

If individuals are not limited in the number of Screening games they are allowed play, should we worry that a small number of individuals will monopolize the two servers, such that many people won't find the bot ever available at a convenient time?
 
Quote:
But given 3 weeks time, a maximum of 2 concurrent games, and an average of 2 hour games, it's actually impossible to get more than 504 games. About 12 times the number of games for the most popular screening year so far in 2011 with 85 games total. 45 for sharp and 40 for marwin. So in practice the increase in human exposure would almost certainly be well within an order of magnitude.

Wait, are you saying that as long as we don't let humans play ten times as many games as before, you aren't worried about moving the goalposts?  Say, triple the number of Screening games against the eventual Challenger would not be a problem?  (Note that having a single bot available for play on both servers instead of two bots in itself doubles the exposure of that bot.)
 
I have always wanted as many different humans as possible to participate in the Screening, but somehow to me that feels less hard on the bots than allowing a few individuals to play over and over and over.  Not only does the former make the Screening more of a community event than the latter, allowing the latter would, I think, significantly increase the ability of humans to defend the Challenge due to the efforts of dedicated bot-bashers.
 
One can make a case that the current exposure of the bots is too little, but it seems clear-cut that to increase it now creates bad publicity.  Surely there is a way to change the format to address your concerns about it without also making it harder for the Challenge to be won?
« Last Edit: Mar 24th, 2015, 1:30am by Fritzlein » IP Logged

Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.