Welcome, Guest. Please Login or Register.
Mar 28th, 2024, 4:03am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « Challenge Screening Rules »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   Challenge Screening Rules
« Previous topic | Next topic »
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Challenge Screening Rules  (Read 6244 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Challenge Screening Rules
« on: Mar 15th, 2015, 2:26pm »
Quote Quote Modify Modify

Several informal discussions of the rules for the Challenge Screening phase have popped up recently.  It has been a long time since the current format was chosen; since then we have gained a lot of experience with it, and also circumstances have changed.  It seems time to revisit the reasons behind the rules thoroughly and formally.
 
Note this discussion will not be purely theoretical.  Omar has expressed a willingness to overhaul the rules, and an eagerness to hear what the community thinks.
 
The current rules seek to meet two objectives simultaneously.  One is to discourage developers from trying to win the Computer Championship instead of targeting the Arimaa Challenge.  It might be possible to create an anti-bot bot that would do well in the Computer Championship but be easily exploitable by humans.  Taking the top two finishers from the Computer Championship and selecting the one that scores better against humans in the Screening is intended to give the opportunity to win the Challenge to the bot that is most geared towards doing so.
 
The second objective is for bots to have a "fair" amount of exposure before the Challenge.  Man vs. machine matches are held under very different circumstances, which can significantly tilt the odds towards one side or the other.  For example, in Deep Blue vs. Kasparov, the computer had no public record whatsoever.  One can argue that Deep Blue eked out a narrow victory in part due to the surprise factor.  Kasparov tried to win with some anti-computer play that backfired.  If he had known what to expect, he might have been able to win with regular chess.  On the other extreme, in the Fritz vs. Kramnik match and the most recent shogi man vs. machine match, the humans got an exact copy of the software to train against.  They could have won merely by discovering a single strategic weakness and exploiting that specifically.
 
The Screening tries to find a middle ground.  Forcing the bots to be exposed to the community means that humans can't be taken completely by surprise.  On the other hand, limiting each player to one game with each color against each bot (and barring the defenders themselves from playing) protects the bots somewhat from being beaten by bot-bashers playing obsessively until they work out some narrow winning formula.
 
Criticisms of the current system include (1) the difficulty of enforcing the rules when duplicate accounts are so easy to create (2) the perception that the results can be skewed by players not taking the Screening seriously (3) the perception that the results can be skewed by someone who has a motive for one bot to win rather than the other, (4) the perception that the best anti-human bot is not necessarily selected anyway due to pure randomness, and (5) a desire to play the bots more than the Screening permits.
 
The primary argument for changing nothing is that most suggested changes tilt the playing field towards a successful defense of the Challenge.  The current rules have already been criticized in some quarters as being too favorable to the humans, and making it even more so would rightly be criticized as "moving the goalposts", fostering a perception that we are changing the rules because we are scared of losing now.
 
That said, any suggested changes will be considered.  Let the ideas flow!
« Last Edit: Mar 15th, 2015, 6:50pm by Fritzlein » IP Logged

kzb52
Forum Guru
*****



Arimaa player #8454

   


Gender: male
Posts: 71
Re: Challenge Screening Rules
« Reply #1 on: Mar 15th, 2015, 2:57pm »
Quote Quote Modify Modify

I'll start by mentioning a suggestion you have put forward a few times, and that I quite like as well.  
 
The idea is to move from a totally open screening process (where players with questionable backgrounds and motives can participle) to one where a list of players approved by the challenge TD who have all agreed to play 2 or 4 games are the screening participants.
 
Since the screening games are quite a commitment (up to 8 hours of fun!) I think we should make an effort to be as inclusive and easygoing on players as possible.  Let players volunteer (or even un-volunteer) on short notice and at any point during the screening period.  Ideally, the TD would reach out to some players who would be excellent screening participants - but may not be following arimaa.com events too closely - as well.  Very few players who ask to participate should be denied.  We do want participants who represent a wide variety of styles and skill levels.  This also has the benefit of (probably) reducing the number of uncompleted game pairs that we just have to ignore under the current format.
 
There are a couple of ways to implement this, I'll suggest a few here
-the TD maintains a list of players allowed to start screening games, and you have to contact him and be approved on the front end before starting
-anyone can start screening games as under the current system, but players the TD doesn't approve of are removed from the official screening list after the game(s) are completed
-there is a registration process, but to avoid involving the TD every single time a player signs up, it is automated and really straightforward.  The TD can then remove people who shouldn't be playing at any time, either before or after screening games are played
 
My two biggest concerns about this change are that a) it's going to be more work for the director and b) it may limit the number of games played in the screening, which could make the results less reliable (concern #4 above).  I have two other ideas I'll share that might work better.
 
1.  Allow anyone who wants to play the bots to do so, as many times as they want.  Then let the TD pull out pairs of games (there would have to be some rules about how this process goes) which will actually count toward the screening. More games means a (theoretically) more reliable result, and if the TD can remove games played under questionable circumstances then foul play is much less likely to skew the verdict.  However, this may be perceived as "moving the goalposts" as it does allow humans more tries to work out the bot's weaknesses.
 
2.  Raise the minimum number of games needed to play the screening to a higher number (I think 50 is about right).  Right now, I believe the cutoff is at 10 games, and it exists to prevent players from creating another account to play more screening games.  Raising the limit would make this harder to do.  Virtually all (or possibly all) the current participants have played far more than 50 games, and virtually everybody who's played less won't have the experience necessary to provide decisive wins in the screening (in my opinion).  This is very easy to implement and I think it'll help make the screening at least a little more secure.
 
So my first alternative idea is to make things more inclusive, and the second is to make it less so.  Both directions have their pros and cons.
 
Nice post Fritz, I'm curious about what others think Smiley
« Last Edit: Mar 15th, 2015, 3:09pm by kzb52 » IP Logged
Belteshazzar
Forum Guru
*****



Arimaa player #5094

   


Gender: male
Posts: 108
Re: Challenge Screening Rules
« Reply #2 on: Mar 15th, 2015, 4:08pm »
Quote Quote Modify Modify

on Mar 15th, 2015, 2:26pm, Fritzlein wrote:
The current rules seek to meet two objectives simultaneously.  One is to discourage developers from trying to win the Computer Championship instead of targeting the Arimaa Challenge.  It might be possible to create an anti-bot bot what would do well in the Computer Championship but be easily exploitable by humans.  Taking the top two finishers from the Computer Championship and selecting the one that scores better against humans in the Screening is intended to give the opportunity to win the Challenge to the bot that is most geared towards doing so.

 
I remarked earlier that if sharp lost the screening, the screening process would have to be reconsidered.  Although not very likely, it is still perhaps possible that Z could win the screening, even though it's clearly weaker than sharp.  What about simply putting the Computer Championship winner up for screening by itself (especially if it decisively won the CC)?  If it does poorly, then the second place finisher gets a chance.
« Last Edit: Mar 15th, 2015, 4:10pm by Belteshazzar » IP Logged
browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: Challenge Screening Rules
« Reply #3 on: Mar 15th, 2015, 10:57pm »
Quote Quote Modify Modify

I definitely think that the screening games should have a delay. I don't like that active spectating is discouraged because the games are not delayed. Even though I'm not discouraged, some people are and I miss their comments.
IP Logged

deep_blue
Forum Guru
*****



Arimaa player #9854

   


Posts: 212
Re: Challenge Screening Rules
« Reply #4 on: Mar 16th, 2015, 12:04am »
Quote Quote Modify Modify

I might agree to browni. Then I also could be in the chatroom while my games without anyone by mistake could make a comment on that game. Wink
IP Logged
Boo
Forum Guru
*****



Arimaa player #6466

   


Gender: male
Posts: 118
Re: Challenge Screening Rules
« Reply #5 on: Mar 16th, 2015, 1:13pm »
Quote Quote Modify Modify

Quote:
I definitely think that the screening games should have a delay.

 
+1
 
Personally I would like an option to play screening games with 1min per move. Devoting 8 hours is problematic.
IP Logged

Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Challenge Screening Rules
« Reply #6 on: Mar 16th, 2015, 2:07pm »
Quote Quote Modify Modify

Regarding screening delay; I can't think of any disadvantages and there seem to be a few advantages to having screening games delayed.
 
For the overall screening process; I would love it if someone knows how to calculate a likelihood of superiority for the two bots from each of the past screenings. My feeling is that the screening is only marginally better than a coin toss at choosing the better "anti-human" bot. Further given the constraints of time, community size and involvement, etc. that it is simply impossible to design a screening that consistently gives a good discrimination between the two bots. Also I think winning the Computer Championship should give a lot better reward towards a shot at the challenge than just a tie break. In summary, I think the current screening method is quite unfair to the CC winner. Also any change that further limits the number of participants is, in general, most likely going to make it worse. Given the constraints we have to work with, it seems much better to simply let the CC winner directly have a shot at the challenge.
 
I do think it's a good idea to expose the challenger to human play before the challenge takes place. It also seems that there would be a general interest to play all the CC bots on the championship hardware and settings. So I would propose replacing the screening with a period where all the CC bots are available to play. It would be nice if there was some controls put into place to only allow one bot running on a given server at a time. But if something went wrong and multiple bots ran at a time or a game ended due to a server error there wouldn't be a need for any corrective action on the game (except for a possible unrating). In order to keep from moving the goalposts too much the challenge defenders should probably be restricted from playing the CC winner. The possibility  of extra games played against the challenger is somewhat counter balanced by assurance of being the challenger if you win the CC and I expect that the number of games will probably remain within the same order of magnitude anyway.
 
Janzert
IP Logged
browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: Challenge Screening Rules
« Reply #7 on: Mar 16th, 2015, 2:26pm »
Quote Quote Modify Modify

on Mar 16th, 2015, 2:07pm, Janzert wrote:
Regarding screening delay; I can't think of any disadvantages and there seem to be a few advantages to having screening games delayed.
 
For the overall screening process; I would love it if someone knows how to calculate a likelihood of superiority for the two bots from each of the past screenings. My feeling is that the screening is only marginally better than a coin toss at choosing the better "anti-human" bot. Further given the constraints of time, community size and involvement, etc. that it is simply impossible to design a screening that consistently gives a good discrimination between the two bots. Also I think winning the Computer Championship should give a lot better reward towards a shot at the challenge than just a tie break. In summary, I think the current screening method is quite unfair to the CC winner. Also any change that further limits the number of participants is, in general, most likely going to make it worse. Given the constraints we have to work with, it seems much better to simply let the CC winner directly have a shot at the challenge.
 
Janzert

 
There are also a few things we can do to encourage more screening games. Lowering the time control to 1m/move like Boo suggested or 45s/move would certainly help, and I know there are objections to this because it's not the same as the Challenge time control, but right now we just don't get enough games for a fair contest, so I think it is a good sacrifice.
 
Also, we might be able to increase the number of pairs allowed per player if we shorten the tc. Players that don't want to play even one game now might be willing to play six at 45s/move. Six short games can be less of a commitment than one long one. It's a bit of a gambit given the possibility that only a few people will play all of their pairs and the rest will not change behavior, but again I think it's worth it to try to get more games.
 
Another thing that could be done is offering an incentive for players who complete all four games, such as a free digital copy of chessandgo's book (given his approval, of course) or a discount on the next years WC entry, or a free postal mixer entry, etc.
 
If we feel that the screening isn't fair to the WCC winner, we could either consider the WCC a qualifier for the main event, or increase the advantage of the WCC winner in the screening.
« Last Edit: Mar 16th, 2015, 2:36pm by browni3141 » IP Logged

odin73
Forum Guru
*****






   


Gender: male
Posts: 65
Re: Challenge Screening Rules
« Reply #8 on: Mar 16th, 2015, 3:18pm »
Quote Quote Modify Modify

Quote:
Several informal discussions of the rules for the Challenge Screening phase have popped up recently.  It has been a long time since the current format was chosen; since then we have gained a lot of experience with it, and also circumstances have changed.  It seems time to revisit the reasons behind the rules thoroughly and formally..
That said, any suggested changes will be considered...... Let the ideas flow!

Let me state some short comments:
 
1. I think the screening with two bots is obsolete. It´s a heritage from antigue times when bot bashing gave easy and obvious opportunities to win for human players. Nowadays this time is over. Just let´s go for the strongest CC bot.
 
2. When only one bot is played a further issue becomes more important to be focused: time. For players with a real life, it´s rather unlikely to find the time for 4 long lasting games. Just go with one bot, i.e. 2 games per set. If anybody wants to play a bot for serveral times, let him/her to do so. The more games the defenders are able to check for the bot´s behavior the better.
 
3. Don´t restrict the screening to a few players (as proposed by kzb52). In last years even weaker players contributed with some games to the process how to win against a bot. Remember aurelian, he managed to win against Ziltoid with a rating difference of 900 rating points.
 
4. I think we don´t need a delay for a screening game. It should be handled like any other normal HvB game when comments in the chat are common and welcome (also absolutely no issue when a single bot is played). Screening is for bot bashing, for nothing else.
 
5.  Time issue #2: Think twice about the time control. 1min/move or 90s per move may be more attractive for some players. Let the player chose (as stated by Boo recently) at least between 1min and 2min/move. 2min/move for the Challenge is ok.
Fritzlein, you may find it funny to play 7-8 hours non-stop, me and most player´s rather don´t, imo.
 
6. Collateral issue: This year the old 2014´s bots were exposed rather late to common playing. This may be an issue since there wasn´t much time for everybody (also the defenders) to adapt on the bot´s 2014 playing strength and so maybe make it more difficult to be prepared for the new 2015´s bots.
 
Enjoy!  
 Cool
IP Logged

Belteshazzar
Forum Guru
*****



Arimaa player #5094

   


Gender: male
Posts: 108
Re: Challenge Screening Rules
« Reply #9 on: Mar 16th, 2015, 8:12pm »
Quote Quote Modify Modify

on Mar 16th, 2015, 3:18pm, odin73 wrote:
1. I think the screening with two bots is obsolete. It´s a heritage from antigue times when bot bashing gave easy and obvious opportunities to win for human players. Nowadays this time is over. Just let´s go for the strongest CC bot.
 
2. When only one bot is played a further issue becomes more important to be focused: time. For players with a real life, it´s rather unlikely to find the time for 4 long lasting games. Just go with one bot, i.e. 2 games per set. If anybody wants to play a bot for serveral times, let him/her to do so. The more games the defenders are able to check for the bot´s behavior the better.

 
Good points.  However, Fritz expressed concern above that if the CC winner were automatically the defender, that might cause bot developers to focus too much on performance against other bots.  So we should still have some kind of safeguard in place.  I would suggest that the CC winner be screened alone, and if it turns out to be susceptible to bot-bashing, the second place finisher could then be screened.
IP Logged
browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 384
Re: Challenge Screening Rules
« Reply #10 on: Mar 16th, 2015, 9:41pm »
Quote Quote Modify Modify

on Mar 16th, 2015, 8:12pm, Belteshazzar wrote:

 
Good points.  However, Fritz expressed concern above that if the CC winner were automatically the defender, that might cause bot developers to focus too much on performance against other bots.  So we should still have some kind of safeguard in place.  I would suggest that the CC winner be screened alone, and if it turns out to be susceptible to bot-bashing, the second place finisher could then be screened.

 
How could this be done in a fair way? Who gets to decide whether the CC winner is more bashable than the runner up might be without even playing them both?
IP Logged

supersamu
Forum Moderator
Forum Guru
*****



Arimaa player #7523

   


Gender: male
Posts: 140
Re: Challenge Screening Rules
« Reply #11 on: Mar 17th, 2015, 5:42pm »
Quote Quote Modify Modify

I have read the previous forum posts but want to make a summary of what is important to me.
 
I for myself am very much opposed to anything that can be argued to be a disadvantage of the challenger bot. I would really dislike having to hear that we are moving goalposts.
 
How can we get a better likelihood of choosing the strongest bot for the challenge and simultaneously not disadvantage the bots compared to the old screening rules?
 
- Encourage Players to play more games, for example by giving a monetary incentive
 
- Let Players be able to play games at shorter time controls, possibly let games with shorter time controls not count as one full pair for purposes of determining which bot is better, but definitely don't allow Players to play more than 4 games in the screening. (moving goalposts)
 
- Not finishing a screening pair could cost Arimaa points
 
- I don't like letting Players be able to play games at shorter time controls without them counting towards the screening, because then Players could try weird bot-bashing techniques where even if the bot manages to win, it gains nothing towards the screening. In that light, even letting shorter games count for less could be seen as encouraging bot-bashing in games with shorter time control.
IP Logged

clyring
Forum Guru
*****



Arimaa player #6218

   


Gender: female
Posts: 359
Re: Challenge Screening Rules
« Reply #12 on: Mar 17th, 2015, 6:21pm »
Quote Quote Modify Modify

on Mar 17th, 2015, 5:42pm, supersamu wrote:
- Not finishing a screening pair could cost Arimaa points
I could also see this scaring me away from starting a pair if I'm not totally sure I will be able to finish it. In that light, maybe not the best way to boost participation.
IP Logged

I administer the Endless Endgame Event (EEE). Players welcome!
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: Challenge Screening Rules
« Reply #13 on: Mar 18th, 2015, 8:22pm »
Quote Quote Modify Modify

on Mar 16th, 2015, 2:07pm, Janzert wrote:
I would love it if someone knows how to calculate a likelihood of superiority for the two bots from each of the past screenings.

 
I realized I could simply use bayesElo, although I'm not completely sure its calculation is correct for this situation. But here's what it comes up with for all the past screenings:
 
Code:
2009: 79% bot_clueless > bot_Gnobot
2010: 65% bot_marwin > bot_clueless
2011: 59% bot_marwin > bot_sharp
2012: 75% bot_briareus > bot_marwin
2013: 65% bot_marwin > bot_ziltoid
2014: 60% bot_ziltoid > bot_sharp

 
Those numbers certainly reinforce my dislike for the screening.
 
Janzert
IP Logged
deep_blue
Forum Guru
*****



Arimaa player #9854

   


Posts: 212
Re: Challenge Screening Rules
« Reply #14 on: Mar 19th, 2015, 9:56am »
Quote Quote Modify Modify

Here are my thoughts:
1. I would agree to a screening where one could play as many games as one wants. Yes, one could just play long enough to find a single weakness BUT a good artificial intellegence should play strong enough to not have such easy to exploit weaknesses.
Allowing this would solve a bunch of problems, first noone (except maybe bot programmers) would want to create a duplicate account. Also there would be many more game which would clearly give a more precise result of the playing strength of the bots etc.
2. I would agree with shorter time controls being possible.
3. I disagree with odin that only one bot should be allowed in the Screening. Firstly the WCC is a short event with much luck involved when it's close. Generally I see nothing wrong with playing two bots and if one proves to do better against humans than against bots, why not?
4. I consider odin's idea not delaying and still proposing moves in the chat to be interesting. Humanity wants to find a bot's weaknesses so I could imagine working together for this.
5. It would definitely be wrong to penalise not finishing pairs. Giving a free WC entry when finishing all playable pair sounds interesting though.
 
6. New idea: In case of unlimited games one could give players who have played less screening games so far a higher priority of being allowed to start games in case many players want to play at once.
« Last Edit: Mar 19th, 2015, 9:57am by deep_blue » IP Logged
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.