The last time anyone did a project like this (ChessBrain), they were very careful not to give any meaningful performance results. I can tell you why: latency is critical in these applications, and APHID scalability is terrible once you go over a few nodes. Scaling YBWC is even harder, more latency-sensitive, and responds terribly if you can't effectively share memory over multiple nodes.
Using a lot of CPUs to generate heat is easy. Using them to generate good chess moves is another matter.
How about having your "2000-node chess program" play against the strongest GPL-ed chess program (Stockfish 2.1), running on a fast 4-core box?
If you can win a match of meaningful length against that, I might be interested.
Until you get there, this is a nice exercise in combining technology to achieve meaningless results.
Indeed. APHID doesn't scale well and it will be hard to distribute YBWC efficiently but we think we have a shot at it with very short compute windows for each worker. We may also look at other algorithms but currently a variation of YBWC seems the most promising.
In terms of strength, we'll add the UCI protocol to be able to run games against other engines. We'll see how it performs but we don't expect to beat Stockfish easily, even with a very large number of nodes. That will be part of the experiment and we'll be happy to publish the results.
Now, another path to get real performance would be to find a way of using webGL on the workers. Folding@home has proven it's sometimes 10x more efficient than using CPUs. If we can manage to do that then we might have a shot at being both the largest and strongest AI...
Fair point. Though I'll have to get burned trying before being convinced it's that complicated :) However webGL is not in the critical path of the Guinness Record thing, so I'll happily trust you for now!
You might have a look at Monte Carlo Tree Search & improvements they made on it for game playing. It's easy to parallelize and has had amazing results with Go.
You might also combine it with a moves/games database and play variations on these games according to a softmax distribution. (instead of playing random games with classic MCTS)
I think this will work much better than tree search but still, I think it will be really hard to beat eg. Stockfish. You got nothing with lots of processor power if you're doing the wrong calculations:)
It is estimated that doubling the speed of the computer adds 50-70 elo points (http://en.wikipedia.org/wiki/Computer_chess cites David Levy's book how computers play chess.)
This means that they will have to achieve anywhere between a 2^7 to a 2^10 speedup over a normal single PC. Once you consider the latency problem described above, it is very unlikely that they can achieve this speed up. To put this problem into context, even on a quad-core a three times speedup is considered pretty good in computer chess.
It is indeed unlikely. But if we can actually get to the point that 2^10 web clients (1024) are as strong as Houdini or Rybka on one high-end hardware, it won't be that hard to double the size of the web grid a couple more times.
As you pointed out, the main challenge is scaling efficiently, and not getting a poor speedup even with that many workers (I think we can aim for 60% speedup with YBWC). We hope to have an edge here with Node.js' I/O performance, but it's still a big challenge (and that's what keeps it interesting ;-)
Note that most top engines barely get this speedup with 4 cores on an SMP system. I doubt you can get the same efficiency in a distributed system with 2000 cores.
As noted on the Garbochess JS page, I didn't estimate the strength of the JS version. It would be much weaker than Garbochess 2.20 on the same hardware. Having a node.js uci adapter would be fun to test this.
It will be very difficult to make Javascript competitive with Stockfish, even across 1000 computers. Still, a very cool project!
If they can get it scaling decently, it's a good achievement in its own right. Say that JS with a modern JIT is 4 times slower than C/C++. This means the latency of their system will look relatively better by a factor of 4.
If they now find an algorithm that scales to 2000 nodes with an efficiency as low as 20%, this is a breakthrough achievement. It would be portable to a C/C++ based program with "only" a 4 times faster interconnect.
That said, an engine with a primitive search scales better because the tree is more regular and it's easier to predict the real size of workloads. It could end up that the algorithm doesn't actually work for a strong engine.
But anyway, the "mere" 4x factor due to JavaScript is peanuts compared to the rest of the problems.
I think you're being too pessimistic. Even if high playing strength isn't achieved, using JS to do @home projects is huge, and putting it together in 48 hours is pretty sweet.
> Widget ethics. Using third-party processing power has usually been opt-in: people needed to install SETI@home or the BOINC software to lend their CPUs. JavaScript allows us to use people’s CPUs without their explicit consent just when they visit a page with the widget, which makes this a grey area.
This could be a new way of generating revenue from a website: selling time on visitors' machines. I'm not saying it's a good or ethical idea, but it's certainly an interesting idea.
Plura Processing tried it a while back with a Java applet. There was a big uproar when one of their affiliates sold visitor's compute time without disclosure.
I think selling time can definitely be considered as unethical. However what scares me most is people using this to do DDoS or other network-related attacks. Because of JS' performance, from a pure performance point of view, it is currently much more economical to spin off a couple EC2 instances, but that may indeed change in the future.
I tried using this, but it left me with the impression that it was unprofessionally done.
1. The board is not correct. The black square must be in the left corner. While this may seem like a small quibble, it changes the patters so that squares like e4, which should be white, are now black. This makes it difficult to play.
2. There are apparently no openings programed into the computer. Since openings are so complex, simply using an algorithm to find good opening moves does not work well. So, this program will almost always start out with bad first moves.
Please keep in mind it was hacked together in 48h, so yes it's supposed to be unprofessionally done ;-) However we'll now calmly fix all the issues and continue improving it:
https://github.com/joshfire/chessathome/issues
Feel free to report more bugs, we love feedback :)
Yes, it's astounding (if true) that Pocket Fritz only searches 20k nodes per second, compared to Deep Blue's 200M. There must be some serious positional intelligence involved there!
For a graphical grasp of this concept, take a bell curve, and then plot only the top 0.1% of the bell curve. That's the distribution of abilities among people who are any good at all at chess, and it is no longer anything close to a bell curve.
Well, enthusiasm was clearly speaking here but you can consider it epic in that one player will be thrown against potentially "all" the computers of the world. Kind of a "Man vs. Grid" thing.
But yes, in terms of pure strength, nothing epic for now :)
Variation on a theme: implement Stockfish in NACL, or, if Stockfish is too hard to port, a simpler though strong program like Fruit.
Then, you can essentially isolate out one of the variables, the strength of the "node", or client, engine, and you can play your Stockfish distributed against a Stockfish on a local box.
Of note is that there will still be 2 advantages the distributed version has:
- more CPUs (obvious to everyone)
- more RAM (well, obvious once you mention it)
If the distributed version is stronger then one question will be, was it more CPUs, or more RAM, that tipped the balance?
For any reasonable amount of RAM (say >64M) and a strong engine with a good hash-replacement algorithm: always the CPUs. They are simply not RAM-limited.
It occurs to me that you can be thinking whilst it is the opponent's turn i.e. you can give some nodes the job of looking at likely next moves and others for determining counters to that and so on.
PS don't forget to ask your contributors to open a tab for each core they have if their browser is going to use a separate OS thread for each
PPS Google Chrome's NACL is, longer term, an interesting direction perhaps?
Yes, most chess programs do this, although it is forbidden in some chess computer tournaments with stricter rules (but we'd never qualify in those anyway :)
As for asking people to open multiple tabs, that may be an additional instruction for D-Day if they want to max out their CPUs. But we figured 1 CPU maxed out while browsing a website would already be enough :)
Chrome's NaCl is definitely an interesting direction, but we'll investigate webGL first which seems to be more promising (see Folding@home).
The worker implementation could be made generic with a little work. These people made a map reduce implementation with a similar design: http://maprejuice.com/
duh, reminds me of my long forgotten project, I got 15GB of 200Mio played games on FICS and 15800 games of grandmasters. Took too long for my AI to look through all of them ..
Using a lot of CPUs to generate heat is easy. Using them to generate good chess moves is another matter.
How about having your "2000-node chess program" play against the strongest GPL-ed chess program (Stockfish 2.1), running on a fast 4-core box?
If you can win a match of meaningful length against that, I might be interested.
Until you get there, this is a nice exercise in combining technology to achieve meaningless results.