Welcome, Guest. Please Login or Register.
Nov 22nd, 2024, 11:41am

Home Home Help Help Search Search Members Members Login Login Register Register
Arimaa Forum « 2014 Arimaa Challenge »


   Arimaa Forum
   Arimaa
   Events
(Moderator: supersamu)
   2014 Arimaa Challenge
« Previous topic | Next topic »
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: 2014 Arimaa Challenge  (Read 9041 times)
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
2014 Arimaa Challenge
« on: Mar 11th, 2014, 11:30am »
Quote Quote Modify Modify

After a brief discussion of how much we fear the bots and/or expect to dominate them, the 2014 Arimaa Challenge screening games have begun.  The bots won four of the first five, a relatively strong start, especially since ziltoid was noticeably ahead at one point even in the one game it lost, to browni3141.  I recently told a casual observer that the strongest bot is about 2200 on the gameroom scale, and they're jointly a bit ahead of that out of the gate.  Does anyone care to revise their assessment of the 2014 bot strength based on the early evidence?  As the screening continues, I will update table below with the results from this year.
 
Year  Pairs  Decisive  Winner / Score / Perf  Loser / Score / Perf  
----  -----  --------  ---------------------  --------------------  
2007     12    .    2    . bomb / 2 / 2087    . Zombie / 0 / 1876
2008     16    .    7    . bomb / 6 / 1918    .  sharp / 1 / 1576
2009     23    .    7  clueless / 5 / 1910    . GnoBot / 2 / 1792
2010     25    .   11    marwin / 6 / 2065    clueless / 5 / 1960
2011     40    .   11    marwin / 6 / 2110    .  sharp / 5 / 2109
2012     33    .    7  briareus / 5 / 2232    . marwin / 2 / 2128
2013     25    .    6    marwin / 4 / 2121     ziltoid / 2 / 2055
2014     33    .   11   ziltoid / 6 / 2259    .  sharp / 5 / 2244
« Last Edit: Mar 31st, 2014, 8:08pm by Fritzlein » IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 385
Re: 2014 Arimaa Challenge
« Reply #1 on: Mar 11th, 2014, 12:12pm »
Quote Quote Modify Modify

It's pretty much what I would expect so far. The fact that ziltoid couldn't convert a strong position against me indicates weakness more than getting to such a position indicates strength (if you look at the game, I made a pretty major blunder at 8s)
I guess I would have expected at least one of the other four games to have been a human win, but it is a pretty small sample.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2014 Arimaa Challenge
« Reply #2 on: Mar 11th, 2014, 3:27pm »
Quote Quote Modify Modify

on Mar 11th, 2014, 12:12pm, browni3141 wrote:
The fact that ziltoid couldn't convert a strong position against me indicates weakness more than getting to such a position indicates strength (if you look at the game, I made a pretty major blunder at 8s)

Well, gee, if the human players never made mistakes, then I would bet against the bots just as heavily as you do.  Wink  In the history of chess man vs. machine matches there was a glorious tradition of discounting machine victories as meaningless because of how badly their human opponents played.  But I guess it isn't limited to man versus machine; there is also the ancient chess quote, "I've hardly ever defeated a healthy opponent". Smiley  
 
Quote:
[...] it is a pretty small sample.

Amen to that!  Even with each bot playing dozens of games by the end of the screening, the sample remains small, and it is hard to draw conclusions.  For example, I highly doubt that ziltoid2013 was weaker than briareus2012, as if rbarreria had introduced bugs in the mean time, but the performance rating in the screening dropped off by 177 points, as you can see in the table in my first post.  There's a lot a random variation both on the human side and on the bot side.
« Last Edit: Mar 11th, 2014, 3:28pm by Fritzlein » IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 385
Re: 2014 Arimaa Challenge
« Reply #3 on: Mar 11th, 2014, 3:44pm »
Quote Quote Modify Modify

on Mar 11th, 2014, 3:27pm, Fritzlein wrote:

Well, gee, if the human players never made mistakes, then I would bet against the bots just as heavily as you do.  Wink  In the history of chess man vs. machine matches there was a glorious tradition of discounting machine victories as meaningless because of how badly their human opponents played.  But I guess it isn't limited to man versus machine; there is also the ancient chess quote, "I've hardly ever defeated a healthy opponent". Smiley  

But I made a serious blunder and still won. It doesn't take skill from the bot to recognize that I allowed a double hostage and take it. It does take some skill to convert it into a win (which it lacked)
I will admit that if ziltoid had actually won I would probably discount it due to my error. It was a rather large error after all, to go from significantly ahead to significantly behind. 8s was probably a net loss of a dog's worth to a horse's worth of material. I really want the bots to start taking advantage of mistakes we don't even know we're making, but I suppose that's a lot to ask Tongue
Quote:

Amen to that!  Even with each bot playing dozens of games by the end of the screening, the sample remains small, and it is hard to draw conclusions.  For example, I highly doubt that ziltoid2013 was weaker than briareus2012, as if rbarreria had introduced bugs in the mean time, but the performance rating in the screening dropped off by 177 points, as you can see in the table in my first post.  There's a lot a random variation both on the human side and on the bot side.

I have been thinking that the screening is not a very fair way to determine which of the bots becomes the challenger, if the bots are close. I guess that can be argued for any format though. Hopefully the first bot with a legitimate chance at winning the Challenge is clearly ahead of the others.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2014 Arimaa Challenge
« Reply #4 on: Mar 11th, 2014, 7:51pm »
Quote Quote Modify Modify

on Mar 11th, 2014, 3:44pm, browni3141 wrote:
It doesn't take skill from the bot to recognize that I allowed a double hostage and take it. It does take some skill to convert it into a win (which it lacked)

In other words, what a bot executes well doesn't count as skill, and what a bot does badly proves it has no skill?  If you applied your argument in reverse to humans, it would go something like this: "Sure humans make great long-term plans and have good fuzzy evaluation apart from lookahead, but they still completely overlook moves, therefore they stink at Arimaa."
 
To my mind, the threat that is posed by the bots isn't made less by pointing out that it isn't "skill".  I certainly can't play blunder-free myself, so it doesn't comfort me much to say, "Apart from my blunders, I can crush bots."  I'm not going to disparage a bot based on what kind of thing it does well.    Whatever a bot does well gives it winning chances.  Furthermore, if you want to reserve "skill" for what we do better, you need to provide us with another word for being good at finding good moves in the way that bots find good moves.
 
No matter how either player does what it takes to win, it comes down to wins and losses.  So far, at 6-1 over humans, the wins are going relatively badly for humanity.  6-1 against the human opposition would be the expected score of a player with a gameroom rating of 2382.
 
Small sample, small sample, small sample.  Since it ultimately does come down to wins and losses, a small sample isn't likely to change anybody's mind.  When the evidence is too small and nobody is going to give ground, the traditional male maneuver is to place a wager.  I will bet you one hundred Arimaa points that the top bot in the screening will have the highest-ever performance rating in a screening, i.e. over 2232.  I define performance rating as the rating the bot would have needed in order to have an expected number of wins equal to the actual number of wins it got, using gameroom ratings.  I won't even complain that your deflated gameroom rating is holding down the performance ratings of the bots.  If you think this is a bad bet for you, then we have been arguing over nothing, because we actually agree about how tough the bots are to beat.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2014 Arimaa Challenge
« Reply #5 on: Mar 11th, 2014, 8:10pm »
Quote Quote Modify Modify

on Mar 11th, 2014, 3:44pm, browni3141 wrote:
Hopefully the first bot with a legitimate chance at winning the Challenge is clearly ahead of the others.

This is the first year since 2009 that the winner of the Computer Championship wasn't down to its last life, so it is already less of a coin flip than we are used to.  I would bet another 100 Arimaa points that sharp advances to the Challenge with a point to spare, i.e. you would win if ziltoid advances or if it is tied and sharp advances by virtue of having won the CC.  Of course, you could win this bet by throwing your games to sharp, so it's only on offer if you take the first bet too.  Smiley  The point of the bet is that if it unclear which bot is better, it would be slightly in your favor.
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2014 Arimaa Challenge
« Reply #6 on: Mar 12th, 2014, 12:41am »
Quote Quote Modify Modify

OK, browni accepted both bets in chat.  As of the first update, he's losing the first and winning the second: ziltoid leads with a performance of 2354 to sharp's 2057.
IP Logged

mistre
Forum Guru
*****





   


Gender: male
Posts: 553
Re: 2014 Arimaa Challenge
« Reply #7 on: Mar 12th, 2014, 10:02am »
Quote Quote Modify Modify

Bot_Sharp timing out against SilverMitt obviously hurt its performance rating.  Any idea why the time out occurred?
 
IP Logged

Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2014 Arimaa Challenge
« Reply #8 on: Mar 12th, 2014, 12:04pm »
Quote Quote Modify Modify

Janzert has to sort out sharp's timeout against SilverMitt and also kzb's timeout against sharp, which has disappeared from the standings due to kzb's unrating it.  Certainly the game met the criteria for why we allowed unrating of games in the first place, i.e. kzb was way ahead and we trust him if he says he lost connection rather than thinking for too long.  But that doesn't mean it shouldn't count in the standings.  It's a delicate issue, because counting it adds noise to the standings and not counting it opens the screening to some kinds of abuse.
IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 385
Re: 2014 Arimaa Challenge
« Reply #9 on: Mar 12th, 2014, 12:21pm »
Quote Quote Modify Modify

Doesn't counting it also add potential for abuse? If we think sharp is the easier bot to beat, then we have a "network problem" or two to make sure sharp is the one that makes it, not ziltoid. It is harder to detect abuse there than someone intentionally throwing games by other means.
IP Logged

kzb52
Forum Guru
*****



Arimaa player #8454

   


Gender: male
Posts: 71
Re: 2014 Arimaa Challenge
« Reply #10 on: Mar 12th, 2014, 1:37pm »
Quote Quote Modify Modify

To clarify my situation, my timeout was a problem on my end, and was unfortunate but not unusual in that regard.  If I had known the game would disappear from the standings, I would not have unrated it.  I will not play any more games until I get some sort of ruling from above.  If the game needs to be un-unrated, go for it.
IP Logged
Fritzlein
Forum Guru
*****



Arimaa player #706

   
Email

Gender: male
Posts: 5928
Re: 2014 Arimaa Challenge
« Reply #11 on: Mar 12th, 2014, 2:04pm »
Quote Quote Modify Modify

on Mar 12th, 2014, 12:21pm, browni3141 wrote:
Doesn't counting it also add potential for abuse? If we think sharp is the easier bot to beat, then we have a "network problem" or two to make sure sharp is the one that makes it, not ziltoid. It is harder to detect abuse there than someone intentionally throwing games by other means.

Good point, browni.  This is unlike the World Championship tournament, where the players would have a strong incentive to abuse the system to increase their prize payout, although I guess people have been known to lie in order to win with no other motivation than winning.  Here abusing the system would just mean favoring one bot over the other, which can be done by other means.  That argument has me leaning towards trusting the human players to be honest about whether they got disconnected for a minute, as opposed to sending the move with just a few seconds left.  We could resume the game and give extra thinking time to the bot that won on time, similar to how it is done in the World Championship.  Given the lack of financial incentive, it might not be abused at all, so allowing resumption would add less noise to the result than letting the result stand.
 
Perhaps the major downside would be a headache for the TD in trying to get games restarted, though.  What if someone with a super-awful connection participates in the screening, and gets disconnected eleven times over the course of the four games?  Maybe timeouts, even though unfair, should stand out of sympathy for the poor TD.  Getting the most accurate result should be balanced against how much work that is, which has me leaning the other way now.
« Last Edit: Mar 12th, 2014, 2:06pm by Fritzlein » IP Logged

browni3141
Forum Guru
*****



Arimaa player #7014

   


Gender: male
Posts: 385
Re: 2014 Arimaa Challenge
« Reply #12 on: Mar 12th, 2014, 2:30pm »
Quote Quote Modify Modify

on Mar 11th, 2014, 11:30am, Fritzlein wrote:

Year  Pairs  Decisive  Winner / Score / Perf  Loser / Score / Perf  
----  -----  --------  ---------------------  --------------------  
2007     12    .    2    . bomb / 2 / 2087    . Zombie / 0 / 1876
2008     16    .    7    . bomb / 6 / 1918    .  sharp / 1 / 1576
2009     23    .    7  clueless / 5 / 1910    . GnoBot / 2 / 1792
2010     25    .   11    marwin / 6 / 2065    clueless / 5 / 1960
2011     40    .   11    marwin / 6 / 2110    .  sharp / 5 / 2109
2012     33    .    7  briareus / 5 / 2232    . marwin / 2 / 2128
2013     25    .    6    marwin / 4 / 2121     ziltoid / 2 / 2055
2014     .2    .    1   ziltoid / 1 / 2354    .  sharp / 0 / 2057

Does anybody have any theories on why screening participation has gone down in the last two years?
IP Logged

Ail
Forum Guru
*****




Rabbits can't push Rabbits!

   


Gender: male
Posts: 52
Re: 2014 Arimaa Challenge
« Reply #13 on: Mar 12th, 2014, 4:41pm »
Quote Quote Modify Modify

on Mar 12th, 2014, 2:30pm, browni3141 wrote:

Does anybody have any theories on why screening participation has gone down in the last two years?

Theory #1:
People like winning. Winning was easier when the bots were easier to beat.
Thus less people felt like challenging the bots when they expected to be beaten.
IP Logged
Janzert
Forum Guru
*****



Arimaa player #247

   


Gender: male
Posts: 1016
Re: 2014 Arimaa Challenge
« Reply #14 on: Mar 12th, 2014, 7:09pm »
Quote Quote Modify Modify

Regarding game #295864, after examining the bot logs it's apparent that the bot timed out as the result of a network problem between the bot server and arimaa.com server. The relevant section of the log is:
 
Code:
2014-03-12 05:09:20 ERROR:gameroom:Caught unkown exception #1, restarting.
Traceback (most recent call last):
  File "gameroom.py", line 875, in main
    table.playgame(engine_ctl, bot_greeting, options['onemove'])
  File "gameroom.py", line 448, in playgame
    self.move(response.move)
  File "gameroom.py", line 244, in move
    response = post(self.url, values, "Table.move")
  File "gameroom.py", line 83, in post
    response = urllib2.urlopen(req)
  File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib64/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 113] No route to host>

 
By the rules and previous precedent the game should be restarted from the point of the timeout. If SilverMitt is unavailable to resume the game before the end of screening the result should be invalidated and removed from the screening results.
 
Janzert
IP Logged
Pages: 1 2 3  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Arimaa Forum » Powered by YaBB 1 Gold - SP 1.3.1!
YaBB © 2000-2003. All Rights Reserved.