Author |
Topic: 2014 Arimaa Challenge (Read 8814 times) |
|
Janzert
Forum Guru
Arimaa player #247
Gender:
Posts: 1016
|
|
Re: 2014 Arimaa Challenge
« Reply #15 on: Mar 13th, 2014, 8:14pm » |
Quote Modify
|
Sorry for the delay. I'm going to make the symmetric ruling for human timeouts attributable to connection issues as for bot timeouts. Specifically the game should be resumed if possible and disregarded if it can't be completed by the time the screening ends. If you need a game resumed you can get with either myself and/or Omar if you need a game resumed. Janzert
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #16 on: Mar 15th, 2014, 1:53pm » |
Quote Modify
|
I have updated the results including kzb's resumed win over sharp but excluding SilverMitt's timeout win over sharp. This leaves me even on my bets with browni because the top bot is setting a record, but the top bot isn't sharp. Ziltoid leads sharp by 2-0 in completed pairs, and by 2382-2067 in performance rating.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #17 on: Mar 19th, 2014, 5:55pm » |
Quote Modify
|
Arimaa_master's win over sharp drops sharp's performance rating to a disappointing 2036, while ziltoid has kept on trucking to a stratospheric performance rating of 2455. The small sample is obviously at work on both sides. There have been no more decisive pairs completed, so ziltoid continues to lead 2-0.
|
« Last Edit: Mar 19th, 2014, 6:04pm by Fritzlein » |
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #18 on: Mar 20th, 2014, 10:42pm » |
Quote Modify
|
Since last update, browni beat ziltiod, but ziltoid beat arimaa_master and sharp beat both aaaa and harvestsnow, so the bots collectively gained a bit of ground. Ziltoid's lead stretches to 3-0 on the completed arimaa_master pair, but its lead in performance rating shrinks to 2403 vs. 2139.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #19 on: Mar 23rd, 2014, 10:40am » |
Quote Modify
|
on Mar 12th, 2014, 4:41pm, Ail wrote: Theory #1: People like winning. Winning was easier when the bots were easier to beat. Thus less people felt like challenging the bots when they expected to be beaten. |
| Good theory, Ail. It would be nice if we could at least match the 25 completed pairs that we had last year (currently we have 11 with eight days to go), but I'm afraid there will be a lot of "one and done" screening participants. I'll bet people who lose their first screening game are much less likely to play a second than people who win their first. What seems like a fun challenge can quickly turn into a chore without positive feedback. Hat tip to arimaa_master for becoming the first player to complete all four screening games. His final game, a victory over ziltoid, gives sharp its first point of the screening, narrowing ziltoid's lead to 3-1. Ziltoid also leads in performance rating by 2347 to 2172, but there is plenty of time for that to change in the final week of screening!
|
|
IP Logged |
|
|
|
browni3141
Forum Guru
Arimaa player #7014
Gender:
Posts: 384
|
|
Re: 2014 Arimaa Challenge
« Reply #20 on: Mar 23rd, 2014, 11:03am » |
Quote Modify
|
Has omar considered a shorter time control, like 1m/move? I think a lot of people either can't, or don't want to set aside such a large block of time.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #21 on: Mar 23rd, 2014, 3:04pm » |
Quote Modify
|
on Mar 23rd, 2014, 11:03am, browni3141 wrote:Has omar considered a shorter time control, like 1m/move? I think a lot of people either can't, or don't want to set aside such a large block of time. |
| It was discussed in the past, but the argument that the Arimaa Challenge time controls are the ones that should govern the screening is the one that prevailed. One concern is that bots may be better at different speeds, and we want the bot that is best at the Challenge speed. But these days there is starting to be another issue: if you speed up the time control, then even fewer humans will be able to win. Halving the time control probably adds 50 Elo or more to bot strength relative to humans, further demotivating people who get whacked and further shrinking the pool of folks who are likely to provide discrimination by beating one bot and losing to the other. I do see the case for shorter time controls: more games equals more information. In fact, I once proposed that we speed up the time controls temporarily, as long as humans are comfortably ahead, and only slow them down again when we are nearer to defeat. That idea didn't fly because it creates the impression that we are willing to "move the goalposts", i.e. keep changing the rules of the Challenge so that we can be sure to keep winning. For that reason alone, I expect any rule change will be a tough sell to Omar. He would be happiest if we could get away with not making any more changes until the Challenge expires in 2020. For the mean time, I hope we can inspire a few more people to take their best shot at winning a long, slow game. Scoring even one win is an achievement to be proud of. Do it now before our silicon overlords enslave us all!
|
|
IP Logged |
|
|
|
rbarreira
Forum Guru
Arimaa player #1621
Gender:
Posts: 605
|
|
Re: 2014 Arimaa Challenge
« Reply #22 on: Mar 23rd, 2014, 5:46pm » |
Quote Modify
|
I noticed that the precise moment the screening ends is not defined in the rules: http://arimaa.com/arimaa/wc/2014/sch.html http://arimaa.com/arimaa/challenge/2014/ It just says "March 31" without specifying a time or timezone for the games to start/end. It might be worth it to clarify that before it becomes an issue.
|
« Last Edit: Mar 23rd, 2014, 5:46pm by rbarreira » |
IP Logged |
|
|
|
99of9
Forum Guru
Gnobby's creator (player #314)
Gender:
Posts: 1413
|
|
Re: 2014 Arimaa Challenge
« Reply #23 on: Mar 23rd, 2014, 7:21pm » |
Quote Modify
|
on Mar 23rd, 2014, 3:04pm, Fritzlein wrote:He would be happiest if we could get away with not making any more changes until the Challenge expires in 2020. |
| Me too, for the same reasons.
|
|
IP Logged |
|
|
|
Ail
Forum Guru
Rabbits can't push Rabbits!
Gender:
Posts: 52
|
|
Re: 2014 Arimaa Challenge
« Reply #24 on: Mar 24th, 2014, 11:00am » |
Quote Modify
|
on Mar 23rd, 2014, 10:40am, Fritzlein wrote: I'm afraid there will be a lot of "one and done" screening participants. I'll bet people who lose their first screening game are much less likely to play a second than people who win their first. |
| I feel pretty much looked through now. I am like 1:10 against not even the highest level of the 2012-Sharp on my phone having it use like 5 seconds while I use as long as I feel like. Thus I got smashed like expected. And I really don't feel like getting smashed 3 more times. I think that if I can't even put up a good fight against my phone, it's too unlikely I can do against better bots on better hardware.
|
|
IP Logged |
|
|
|
browni3141
Forum Guru
Arimaa player #7014
Gender:
Posts: 384
|
|
Re: 2014 Arimaa Challenge
« Reply #25 on: Mar 24th, 2014, 12:13pm » |
Quote Modify
|
on Mar 23rd, 2014, 3:04pm, Fritzlein wrote: Halving the time control probably adds 50 Elo or more to bot strength relative to humans, further demotivating people who get whacked and further shrinking the pool of folks who are likely to provide discrimination by beating one bot and losing to the other. |
| Wow, my own estimate was that a single doubling was worth about 150 points of strength relative to a bot getting the same time increase, at least for myself. I agree with all the reasons why we shouldn't change the time control, but at the same time I think increasing participation is extremely important, especially as we are losing participation and accuracy in the screening in consecutive years. How about having a reward for each pair completed? Then the problem is where the reward will come from... Perhaps the reward can just be someone's time. Maybe some strong players can annotate all the games of completed pairs, and getting some free game help will be enough for more players to complete at least one pair. Another suggestion that can be implemented independently of previous suggestions is to allow players to complete more than two pairs. This would be a very minor rule change. I understand that omar wouldn't want one player's performance being weighted too heavily, but I don't see how more games can hurt at this point. A cap of three or four pairs seems reasonable. I'm not sure how many people would want to do more anyway. Two is probably already plenty for most Also, I just remembered that I have a half typed response for this thread... on Mar 24th, 2014, 11:00am, Ail wrote: I feel pretty much looked through now. I am like 1:10 against not even the highest level of the 2012-Sharp on my phone having it use like 5 seconds while I use as long as I feel like. Thus I got smashed like expected. And I really don't feel like getting smashed 3 more times. I think that if I can't even put up a good fight against my phone, it's too unlikely I can do against better bots on better hardware. |
| Although games between very close opponents should yield the most information, every pair completed is meaningful, Ail, so it would be really nice if you could play just one more game. If you play a second screening game, then I offer to annotate both of your games for you, and answer any questions you have about either game. If you play another pair after that, I'll do the same for that pair also.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #27 on: Mar 25th, 2014, 5:35pm » |
Quote Modify
|
on Mar 24th, 2014, 12:13pm, browni3141 wrote:Wow, my own estimate was that a single doubling was worth about 150 points of strength relative to a bot getting the same time increase, at least for myself. |
| Hmmm... With three doublings between CC and blitz, that would be a 450 point difference? I admit that the CC bots are probably a bit overrated because the humans don't use their full time allotment, but the actual rating difference between a blitz and a CC bot of the same vintage on the server seems to be in the 150-200 point range on average, just from eyeballing it. So my 50 points per doubling is probably a lower bound rather than an accurate guess, but not a ridiculously conservative lower bound.
|
|
IP Logged |
|
|
|
browni3141
Forum Guru
Arimaa player #7014
Gender:
Posts: 384
|
|
Re: 2014 Arimaa Challenge
« Reply #28 on: Mar 25th, 2014, 8:46pm » |
Quote Modify
|
on Mar 25th, 2014, 5:35pm, Fritzlein wrote: Hmmm... With three doublings between CC and blitz, that would be a 450 point difference? I admit that the CC bots are probably a bit overrated because the humans don't use their full time allotment, but the actual rating difference between a blitz and a CC bot of the same vintage on the server seems to be in the 150-200 point range on average, just from eyeballing it. So my 50 points per doubling is probably a lower bound rather than an accurate guess, but not a ridiculously conservative lower bound. |
| These win-rates seem reasonable: Blitz: 50% Fast: 70% 60s: 85% CC: 93% I would be interested in seeing more data, but for something like this, there are tons of variables which could affect the results.
|
|
IP Logged |
|
|
|
Fritzlein
Forum Guru
Arimaa player #706
Gender:
Posts: 5928
|
|
Re: 2014 Arimaa Challenge
« Reply #29 on: Mar 26th, 2014, 8:50pm » |
Quote Modify
|
Hat tip to Heyckie for becoming the second player to complete all four screening games, and to BlakeD, Braveheart, and BrendanM for individual wins. The bots have slipped a bit to performance ratings of 2286 vs. 2112, and ziltoid's lead has opened back up to 4-1, so it is looking more likely that browni will win both of his bets with me.
|
|
IP Logged |
|
|
|
|