Debunking a Computer Chess Scandal

worldvoyageur · on Jan 7, 2012

Simply on the basis of telling a fascinating story, the series is well worth reading. Beyond that, some of what particularly struck me was that:

- since about 2005, chess programs that run on a regular PC have consistently outperformed the best human grandmasters, and the skill of the computer chess programs continues to improve. 2005 was a quantum shift in the chess programming world, when chess programs first played "brilliant games with deep, beautiful combinations...[and] routinely produce[d] highly artistic masterpieces of chess while avoiding a great many pointless “computer” moves that for many years had been a source of ridicule among strong human players."

- improvement in algorithms is driven by an active community ecosystem, open source contributions and flagrant copying of the brilliant insights of a few. No matter how brilliant the insight, the community is able to improve upon it. Intellectual property protection would have stopped much of this improvement in its tracks.

It's four separate articles, so I posted the first. All four are:

http://www.chessbase.com/newsdetail.asp?newsid=7791 (part 1)

http://www.chessbase.com/newsdetail.asp?newsid=7807 (part 2)

http://www.chessbase.com/newsdetail.asp?newsid=7811 (part 3)

http://www.chessbase.com/newsdetail.asp?newsid=7813 (part 4)

wisty · on Jan 7, 2012

A quote from Rajlich:

> Yes, the publication of Fruit 2.1 was huge. Look at how many engines took a massive jump in its wake: Rybka, HIARCS, Fritz, Zappa, Spike, List, and so on. I went through the Fruit 2.1 source code forwards and backwards and took many things.

It's not as if he never admitted to studying Fruit.

Also, most of the confusion is there because Rajlich didn't have a repo until Rybka 4, so when the reverse engineered his code they formatted it the same way as Fruit, making it look very similar. If a judge missed the bit saying that the Rybka code is "functionally equivalent code", not the actual code, they might be fooled into thinking the orignal codebases looked very similar. The guy doing this wrote something like 30 posts a day on the discussion board about Rybka, and was clearly not a fan.

They should have made a 3-way comparison - Rybka's PST, Fruit's PST, and some independent program's PST (in the same format as the other two).

Bootvis · on Jan 8, 2012

A response has come from the author and panel members on Chessvibes, see the pdf links in here:

http://chessvibes.com/reports/controversy-over-rybkas-disqua...

jules · on Jan 7, 2012

What is a good resource to learn about the important advances in computer chess?

ww520 · on Jan 7, 2012

OT: why do websites find it useful to disable the up/down arrow keys? It is not. It's very annoying. Make navigation a pain in the butt. Make people never want to come back to your site.

jmmcd · on Jan 7, 2012

Maybe it's the fault of the game-history widgets on the page. They use left and right arrow keys for game navigation, so it's possible they steal up and down arrows also.

bjornsing · on Jan 8, 2012

Isn't there something slightly ironic about accusing the best pupil in the class of peeking over the shoulders of the other students during tests? Not that it can't be a correct accusation, but if this guy "cruised to victory in four consecutive WCCC tournaments in 2007, 2008, 2009 and 2010" then he must have done something right (other than copy-paste).

From a copyright perspective I can certainly see an issue, but that would be between Rajlich and the author of this Fruit I would say.

dalke · on Jan 9, 2012

Actually, the argument isn't about cut&paste programming. That would be somewhat easy to determine by simply comparing the binary output (after some searching for the right compiler options from the compiler for that era).

Indeed, the accusations clearly say that there were significant changes to the code, which makes a reverse engineered comparison harder to do.

Instead, the complaint is that the underlying algorithms were re-implemented, instead of being original algorithms.

copper · on Jan 7, 2012

For what it's worth, the WCCC's Rule 2 seems to be a lot like a university honor code (if you do borrow code or ideas, make sure you cite it.) I don't believe that's too hard to follow. Maybe someone should fund a kaggle competition to design the best chess-playing program, with the final requirement of making the code open-source after :)

Given that most of what you'd implement a computer chess engine is pretty much available online (at most,behind a paywall), and digging through the source code of GNUChess and Crafty would give a lot of insight into the scoring function, so, yes, the score function would be pretty much similar in most parts unless someone comes up with a radically new way of doing things.

wbhart · on Jan 7, 2012

From the article, "By definition, plagiarism only happens when credit to sources is not given, which was never the case with Rybka."

I simply do not agree with this. I cannot copy slabs of open source code, incorporate them into my product and pass if off as my own just because I somewhere credited the great performance of my program to stuff I learned from reading open source software.

To me, plagiarism is copying some portion of someone else's work and including it in a work which I (explicitly or implicitly) claim to be my own.

For example, a student handing in a wikipedia article for a homework assignment still commits plagiarism even if they credit wikipedia as a source but don't explicitly say the thing is a wikipedia article. They hand it in as their work when they did not write it. So at this point I do not feel I agree with the article. It's as though the argument is, yes he copied, but it's ok because he gave "credit", there were plenty of new ideas and everyone else was doing it too. That's not the same thing as, "he didn't commit plagiarism". The latter means he did not copy the code.

Having said that, I am absolutely gobsmacked that the committee did not simply ask for the source code to both programs, check that it in fact compiles to the binaries in question, then do a comparison for legally significant quantities of identical code.

It's even more remarkable when you realise that the version of Fruit involved was open source, so they didn't even need to ask for the source code to that!

Note: I edited the above in response to the comments below.

SeanLuke · on Jan 7, 2012

I don't have an opinion on what this guy did and whether he should have been stripped, but you're redefining plagiarism pure and simple.

Plagiarism is passing off someone else's ideas or expression of those ideas as if they were your own. And that's it. Plagiarism has nothing to do with whether you benefit from someone else's ideas illegally or unethically.

Let's say that there's a competition to write an original novel. I take Moby Dick and change one sentence. Then I submit it with a big warning on the front cover that says: "This novel is Moby Dick with one sentence changed by me." I have violated the rules of the contest perhaps. If I win I maybe benefitted from it. But I have not committed plagiarism -- I did not pass off these ideas as if they were my own. It was made clear whose ideas they were.

sp332 · on Jan 7, 2012

I disagree. I think if a student turns in an assignment that's clearly and admittedly just a WP article, the student should not get credit for doing the work, but he didn't commit plagiarism.

bambax · on Jan 8, 2012

> Having said that, I am absolutely gobsmacked that the committee did not simply ask for the source code to both programs

From the article (part 4):

> Critics of the ICGA soon realized that no one has actual Rybka source code from before 2010, not even Rajlich himself, who sheepishly admitted to Nelson Hernandez off-camera in the course of their July 2011 video interview that he had never maintained any form of version control for Rybka source code until Rybka 4.

jsnell · on Jan 7, 2012

What makes you think Rajlich would have given them source access, or that it wasn't asked for? By all accounts he was not willing to assist with the investigation in any way.

crististm · on Jan 7, 2012

I've read the report files. They based the findings on the behaviour of the Rybka binary. As the article says, Rajlich refused to show the source code.

mark_l_watson · on Jan 7, 2012

Good article. I agree that to slam someone's reputation in public, the ICGA should have had much better evidence. My understanding is that the release of Fruit source code inspired many chess programmers.

rwmj · on Jan 7, 2012

For UK/Hungary computer nerds, Dr David Levy was also responsible for the Enterprise 64 computer: https://en.wikipedia.org/wiki/Enterprise_%28computer%29

janzer · on Jan 8, 2012

Although the articles seem to not mention it, it should probably be noted the author is a moderator on the Rybka forums so is most likely not coming from an exactly unbiased viewpoint.

worldvoyageur · on Jan 8, 2012

Though buried as an aside in the fourth part of the piece, the author does mention that he is the moderator on the Rybka forums.

"In my capacity as Rybka forum moderator I have access to posting statistics."

Indeed, his moderator status leads to one of his core debunking arguments. Some Dr. Hyatt guy, apparently a key individual behind the accusations against Rybka, appeared to be making the attack a full-time job, shifting the arguments as needed to defend a guilty conclusion.

Though he doesn't say it, I suspect that noticing this was impetus for the author to seriously tackle a look at the charges from the defense perspective.

Not that there is anything wrong with either side picking a conclusion (guilty, not guilty) and then working hard to support it. However, part of the credence given the initial charge is that it resulted from an independent search for truth. This does not appear to have been the case. Instead, the charges were the prosecution half of an adversarial process. In these articles, now we see the defense.

Caveat: all I know about this matter is what I read in the four part article posted here.

mml · on Jan 7, 2012

A fine example of terrible writing.

medusa666 · on Jan 7, 2012

It has been discussed (in chess circles) in excruciating and unambiguous detail how Rajlich blatantly ripped off (Google it) open source code, thereby violating tournament rules, open source licenses, etc. This "debunking" series of articles by Chessbase is a pathetic attempt to reinvent the past on behalf of one of their most lucrative products, and perhaps to fool a wider audience who didn't follow the original scandal when it happened.

I used to be a big fan of Rybka, Rajlich and Chessbase, but these continued denials just make a bad thing worse. It's like Floyd Landis, Tyler Hamilton, and the third guy.

Almaviva · on Jan 7, 2012

As far as I can tell, the basic facts are:

1. There is damning evidence (probably beyond reasonable doubt) that Rybka contains some copy-pasted code from Fruit (the open source program).

2. The interesting thing about Rybka was that it was dramatically better than any chess engine of its time, due to original improvements.

3. Many people are bitter that Rybka became an extremely successful commercial product, due to being clearly the best chess engine of its time. It also made several formerly viable commercial products non-viable due to being much better.

4. A mass of clones obtained by either disassembly or obtaining source code appeared, which are now equal or slightly superior to Rybka, and this has essentially destroyed its commercial viability.

For people who value the letter of the law, #1 is damning, and Rybka cheated, period. This isn't beyond theoretical doubt, but it's beyond a reasonable one.

But, given that Rajlich could have easily and legally re-implemented the same algorithms in the mudane parts of Fruit that are in question without exact code reuse, and that what is significant about Rybka is actually the original and innovative part, there's a pretty good argument that he's well within his rights to profit from his creation.

harryh · on Jan 7, 2012

Are there some good links I can read about claim #1? I have only just been introduced to this story via the articles on ChessBase and I would like to read "the other side of the story" so to speak.

jsnell · on Jan 7, 2012

The original investigation reports are a pretty good read. (links at the bottom of this article: http://www.chessvibes.com/reports/rybka-disqualified-and-ban... )

dalke · on Jan 7, 2012

I looked at http://www.chessvibes.com/plaatjes/rybkaevidence/ZW_Rybka_Fr... . It happens to be the one mentioned as biased evidence in this chessbase article, though I picked it because it was a PDF and not an RTF file.

This is the the one which has a side-by-side comparison of what appears to be two pieces of code. One of which is "static const int KnightBackRankOpening = 0;"

    for (sq = 0; sq < 64; sq++) {
        P(piece,sq,Opening) += KnightRank[square_rank(sq)] * KnightBackRankOpening;
    }

That appears to damning, but as the chessbase article points out, there's no source for the Rybka case, only reverse engineering. Since KnightBackRankOpening is "static const int 0" (can you really reverse engineer a "const" from machine code?), you would expect any half-decent optimizing compiler to remove that whole chunk of code.

In other words, this seemingly indicting code is a façade; a functionally equivalent implementation constructed to maximize similarity. Granted, the PDF does say "The code shown here is simply the functional equivalent" and "Fruit and Rybka have functionally identical code here too," I looked at the code and the biggest similarity is that both code snippets are rendered in the same style. I don't see any copying. I see different tunings (eg, different weight parameters), and even though both tools are working in the same data representation and algorithm space, I see different implementations of those algorithms.

I find the chessbase article to be much more convincing than the results of the original investigation.

What would convince me otherwise is the same analysis of other modern chess programs, to show that they don't use the same approach.

jsnell · on Jan 7, 2012

In this case there is not even any reverse engineering as such. These are values that IIUC Fruit computes at startup while Rybka has them precomputed. What that section was showing was merely that these tables were most likely computed using the code from Fruit, but with scaled and (automatically?) adjusted weights.

So you're totally correct in that displaying the hypothetical source code in such situation was questionable. It's not a horribly strong bit of evidence, and as an isolated incident, this would hardly be interesting. But as one more piece in a pattern of persistent copying, more so. In particular, it's worth noting that e.g. the code from earlier versions of Fruit would require larger changes than just tweaking these weights to generate the same output tables, making it pretty unlikely that this is just a case of convergent evolution.

So if the PST tables are weak, what's the stronger evidence?

First of all, there's the circumstances of the rebirth of the project:

- Older versions of Rybka were indisputably based on Crafty. (Including replicating some harmless bugs in exactly the same places as a certain version of Crafty, making it possible to pinpoint the point of plagiarism fairly closely to a specific Crafty version).

- After Fruit was released, suddenly Rybka loses all the similarity to Crafty, and acquires similarities to Fruit. This coincides with:

++ Rybka becoming a dramatically stronger player.

++ Rybka losing features that it previously had in support code.

++ Rybka gaining new idiosyncracies in support code, matching those of Fruity.

++ Rybka losing some game playing / evaluation features that it (and Crafty) previously had but Fruit didn't.

++ Rybka suddenly acquiring an evaluation feature set that is a very close match to those of Fruity. Very close, as in much closer than any unrelated engines are to each other, than previous versions of Rybka to the then current versions, or that version of Rybka to much later versions.

++ Rybka suddenly acquiring new fairly arbitary data structures that match those of the same specific version of Fruit, while not matching those of other Fruit version (e.g. the hash structure).

And probably other things, it's been a while since I read the papers.

It's very surprising to me that somebody could read through all of the evidence and believe there's no foul play going on. It seems totally unreasonable to assume that he first wrote from scratch a chess engine that managed to very exactly replicate the foibles of a top open source engine, then threw all of that code out and rewrote from scratch a new engine that happened to replicate the idiosyncracies of a completely unrelated and stronger open source engine.

It seems much more believe that substantial parts of the code of Rybka were copied verbatim from Fruit, and that other substantial parts were ported over to use a different structure in a way that's not really showing a lot of creativity.

Did Rajlich also substantially improve on the code? Clearly at some point he must have, it did eventually become a much stronger program. It's hard to say how much of this original innovation there was in Rybka 1/2, which as I understand are the particularly contentious versions.

Did Rybka later evolve beyond the Fruity origings? Almost certainly so.

Does any of that excuse either violating the GPL or claiming that it was all his own work, at most "inspired" by others? Clearly not.

dalke · on Jan 8, 2012

It's very strange then that both bits of code have a loop which add a lot of 0s together.

When you make inferences at this level, you must validate your methods of comparison. That PDF document does not show how other chess programs do the same feature, so it's very hard to tell if that's a natural way to implement the same approach or if it's a copyright violation.

My knowledge of this is based solely on the chessbase article. It shows one method of fingerprinting chess programs which suggsts that Rybka 1.0b and Streika 2.0 are the closest matches to each other, and somewhat close to Naum 3.1/4.2, while Fruit, Toga, Onno, and Loop are close to each other. Was the analysis incorrect? Rybka 3 and 4.1 are much less similar.

What analysis show that Rybka was "a very close match to those of Fruity. Very close, as in much closer than any unrelated engines are to each other"? The dendrogram clearly shows Rybka 1.0 closer to Naum than to Fruit. Clearly it would be appropriate to compare to the similar methods of Naum.

I don't understand your statement "It seems much more believe that substantial parts of the code of Rybka were copied verbatim from Fruit". Quite clearly OpenOffice looks and feels like Microsoft Office, but just as clearly, the code was not copied verbatim from MS Office. Quite clearly Linux 1.0 was written with knowledge of how MINIX works, along with information from Sun and POSIX; I myself learned about OS design from Tannenbaum's MINIX textbook. But there's no credible claim that that Linux was copied verbatim from MINIX. Do you think Torvalds could have developed an OS from scratch, without any knowledge of how OSes work and without any standards?

People do write systems from scratch which are bug compatible. DR-DOS 5 was bug-compatible to MS-DOS. That was for commercial reasons. Researchers will implement someone else's algorithm so they have baseline comparison for alternate algorithm development. Free software people will rewrite software which isn't free. People who follow the kata development style will reimplement code just to get the feel for how that version is done.

So no, I have no problems in believing that someone would start with one method, do a deep analysis of another program, implement the ideas (but not copy the code!) and end up with similar results. While it may not be perfect clean-room style, it's still not in violation of the GPL.

If there's a GPL compliance issue, then where is the evidence of copyright infringement? All I've seen is examples of algorithm reimplementation.

Not only that, but I gather that there's a lot of reverse-engineering going on; how much of the other proprietary chess programs are similarly "inspired" by deep inspection of other implementations?

jsnell · on Jan 8, 2012

I was writing a point by point reply to this, but halfway through it started looking waste of time. Your reply doesn't seem to be addressing the points I was making. Also you're several times asking for evidence that already has been made available in the documents you chose to not read.

I have no horse in this race, just found the original documents interesting the first time they were linked to on HN. And to be very convincing, since they showed consistent patterns of unforced similarities. (E.g. in the ordering of operations, or in the selection of which operations to support, or even in the presence of dead code) between specific versions of Rybka and Crafty/Fruit, which were not present between different versions of Rybka. They also specifically addressed the question of whether these were the only/most common/best way of doing things (answer: no).

Please read the documents. If you don't find them convincing, I'm not sure there's anything I could do to change your mind. (And I certainly wouldn't be the right person to do so). But there is definitely no point in having a discussion before that.

dalke · on Jan 8, 2012

On your suggestion, I read through http://www.chessvibes.com/plaatjes/rybkaevidence/RYBKA_FRUIT... .

It makes the clear and cogent statement:

> While a large (indeed, almost complete) match is found, it is presumably feasible to opine that the Fruit source code can be taken as a “manual” for chess programming (perhaps in the sense of a modern version of How Computers Play Chess), and if this paradigmatic view is accepted, then the re-use of the same evaluation components might arguably be less derelict.

That theme occurs elsewhere:

> This Fruit/Rybka overlap would already likely meet a “plagiarism” standard, for instance as used in the detection of non-original work in academia and/or book publishing (note that plagiarism is generally an ethical standard and not a legal one). There is also the question of how important this item is from a chess-playing standpoint, perhaps again viewing Fruit as a “manual” in some sense.

The issue, which is also that mentioned in the chessboard article, is that the standard for plagiarism "in the context of computer chess (or more generally, computer boardgames)" is extremely sensitive. It is not the same standard practiced in research, programming, arts, or any other field I can think of. It's so high that it's not reasonable.

As Wikipedia writes, "the notion [of plagiarism] remains problematic with nebulous boundaries." In this PDF I read of multiple cases where the copying is not copyright infringement but one of reimplementing an algorithm. Relevant quotes are "Rybka 1.0 Beta uses bitboards, making direct code comparison ineffective", "Rybka uses a look-up table of patterns, while Fruit does bit-scanning", and "the relative scaling for each rank-based bonus in Rybka is essentially 10-30-60-100, though in units of 256 as in Fruit."

This PDF does stress that the surface differences are not the issue:

> I might stress that the fact that Fruit 2.1 visibly computes these while Rybka 1.0 Beta just has an array is not really relevant for the discussion here. The content is of more import.

where I presume the context includes "everything must have independent origin."

The PST structures are similar, although it uses different weights in parts. Note also the scaling differences between the two code bases - the similarities are in normalized space. Hence, this is again not evidence of copyright infringement. It would not be plagiarism in the scientific research fields I work in; since "influenced by the work of XYZ" would suffice.

This PDF points out that the quad() function in Rybka uses a different scaling, rank-dependent values, and more cases than Fruit.

This is again not a case where "parts of the code of Rybka were copied verbatim from Fruit" but where the approach from Fruit was modified. The PDF author then says:

> Not all of these terms have exactly the same meaning in Rybka 1.0 Beta, and discussing any differences would diverge from my focus on the re-use of the quad() function. Perhaps the main difference is with FreePasser, as to whether the pawn’s path is met by a friendly or enemy piece, which uses SEE in Fruit and “attacks” bitboards in Rybka, and further is split into 3 parts in Rybka.

> As with the PST comparison, it seems that there is a structural similarity between Fruit and Rybka, and the question of “originality” therein allows multiple approaches.

I think these statements are enough to establish that there was no copyright violation for this section. Your question "Does any of that excuse either violating the GPL" is therefore not relevant - there does not appear to be a copyright violation.

Could you clarify what you mean by "violating the GPL"? Does you refer to things like the file parsing code, which shows idiomatic similarities between Rybka and Fruit?

Over and over again I see that the issue is not outright copyright infringement of the chess engine nor lack of attribution, but that the definition of "plagiarism" as used in chess competition is extremely sensitive; sensitive enough that "structural similarity" even with attribution is considered excessive. It's much more stringent than any other field I can think of. As used here, it has lost its moral meaning and become more of a technical term.

Speaking as a complete outsider, it appears that the Rybka code base went through several iterations where it was based on ideas in different, existing programs. This is not uncommon, and is both legal and moral. My Minix example is quite relevant; it's meant to be used as a reference for understanding operating systems, which means people who use it as a reference will tend to create similar OSes. A question (rightly pointed out earlier) is, do open source chess programs serve as a similar manual?

The Rybka author was sloppy though. He didn't use version control until very late, and he followed too closely some of the more boring parts, like file parsing. There may be copyright infringement, and the remedy under the GPL is to request that the author either apply the GPL to the entire program, or remove the infringing parts. That this hasn't happened (I'm only guessing that it hasn't) tells me that the copyright holders aren't concerned enough to ask for help from the SFLC or other organizations which help enforce the GPL.

However, the core part shows signs of creative thought and improvement, which means it was not copied verbatim from Fruit. ("Creative" here in the legal sense related to copyright law.)

That's why Rybka still exists as a commercial program. But the chess competition arena has a different criteria for originality. While they use the term "plagiarism", they do so with a different meaning than used by nearly the rest of the world.

The entire point of the chessbase essay was to stress that this "originality criterion" is increasingly at odds with how software, including chess programs, are developed. Not only does it need to be changed, but it should have been changed years ago ("updating WCCC Rule 2 to reflect contemporary reality would be a years-overdue positive step"). At the end of that essay the author quotes:

> A fair group of participating programmers present have expressed they want the rules to be updated. One line of thinking is that attribution plus added value should be sufficient to compete, instead of 100% originality.

You say the documents against Rybka are convincing. I have read a couple of them now, and I am convinced that Rybka is in technical violation of WCCC Rule 2. I am not convinced that it's plagiarism. For that I would want to see lack of evidence of attribution, which is hard given that there is attribution. Nor am I convinced that there's wholesale copyright violation. For that I would want to see large spans of code which are not just functionally identical but which use the same values, same implementation, and same function call order. Here too the strongest evidence shows "structural similarity" but definitely not copyright infringement.

What would it take for you to be convinced otherwise? What was not persuasive in the chessbase article?

jsnell · on Jan 8, 2012

Thanks for taking the time to put together such a good argument. I'm afraid that I won't have time to write another reply like this, so if it's not at all convincing, we'll just need to agree to disagree.

The first point I'd make is that from the point of view of the investigators, what mattered were not copyright issues but a possible tournament rules violation. So they'd certainly not want to muddle the issue with issues of whether something was copied over verbatim or transcribed. On the other hand my interest is more in the GPL exploitation, since that actually matters outside the insignificant scope of computer chess politics.

It should be absolutely clear e.g. from the comparisons between pre-1.0 Rybka and Crafty that there was verbatim code copying going on. And not only in things like parsing code, but in actual game playing code. There is no other reasonable explanation for having exactly the same dead code around in exactly the same places (for example the double-zeroing bugs, comparisons to funny magic numbers that could never be true).

Also in places where arbitrary decisions needed to be made in the code, they were done exactly the same way as in Crafty (e.g. the numbering of pieces, the ordering of operations during evaluation). Now, this might not be proof of those parts of the code having been copied. It would be totally reasonable to argue that the author, having read the original source code, would naturally make the same arbitrary decisions.

But of course nobody cares about the code of that version of Rybka. It's mainly useful as context for what happened after Fruit was released, followed by a new version of Rybka.

First, there is again evidence of object code that exactly matches that of certain parts of Fruit. For example the command parsing, the decision of when to stop searching, or what to do when a result has been found. These are not as strong evidence of verbatim copying though as for the earlier copying from Crafty, since this code is at best idiosyncratic rather than clearly buggy or useless.

Likewise all of the earlier arbitrary decisions start to be made differently. Piece numbering changes from what was used in Crafty previously what Fruit uses. The main evaluation routine stops doing things in exactly the same order as Crafty did them, and starts doing them in exactly the same order as Fruit did them.

It's not really any longer a reasonable defense that this is just how he'd naturally do things after having read the source code and seen an example. Clearly he already had intimate knowledge with another source code base with different conventions. And even if this was just a matter of being inspired by the Fruit code why would he even be rewriting all of this non-essential code rather than adding these concepts to his existing codebase.

From a copyright / GPL point of view, I think the argument for copying is fairly strong already at this point, and the question of how large the rewrites to the other code were is irrelevant. From a tournament rules viewpoint, it's the opposite.

So why isn't there equally strong evidence for verbatim copying in the game playing code as in the earlier Crafty case? Because the underlying board representation also changed from the representation used by Fruity to that used by Crafty, making establishing 1:1 correspondence between source and object code harder. So of course we can't reliably tell what kind of process produced this new code, e.g:

(a) modifying the Fruit code in-place (b) using the Fruit code as a constant guide when writing a new version using a different data structure (c) reading the Fruit code and then at some later point writing entirely new code (d) at some point modifying his old code to use the concepts learned from Fruit

From a copyright / GPL point of view I don't think there's much difference between (a) and (b), but I could be wrong. In either case it doesn't seem like a very creative endeavor. Case (d) should be acceptable to anyone, but seems like a remote possibility at best, since the engine lost a number of game playing features of Crafty exactly at the same time as gaining a set matching those of Fruit very well. From a copyright point of view case (c) seems totally fine to me, but from a chess playing creativity point less so. Further, given the established flagrant pattern of copying, it does seem a lot more likely that he would have been taking the expedient road out in this particular instance as well.

I did not find the chessbase article very persuasive due to a few things.

First of all, it was framing the situation of a witch-hunt where a group of jealous sub-peers saw that their only chance of ever being successful again would be to destroy the superior competition by legal means. To an outsider this kind of naked emotional appeal seems very suspicious. Further, trying to suggest that e.g. Ken Thompson would have been motivated by these ulterior motives is just ridiculous.

Second, it's making the argument that really plagiarism is just the accepted custom of computer chess, or at least should be, and that this is therefore selective application of the rules. Knowing whether this is true would require much more intimate knowledge of the computer chess politics than I have any interest in acquiring. However, some of the evidence such as the evaluation feature comparison is certainly suggestive that the level of copying was much more significant in this case than was the norm. Certainly for this verdict to be fair, the reverse engineered and slightly tweaked versions of Rybka should not be allowed to compete in the tournament either.

Third, it tries to establish non-relatedness by the decision dendogram. That's clearly a fallacious argument. A graph like this could show evidence of copying, assuming they were using a sufficiently large amount of positions as input (I presume they did). But it can't possibly be used as evidence of non-copying. That's because a few small changes could easily have an effect on the evaluation results of a large number of positions. Sure, making those small changes are a creative act. But that doesn't mean that the combined work is any less derivative of the original.

Fourth, when discussing the actual evidence, it's considering every bit of evidence totally isolated. "Oh, but it could have happened like this in a totally innocent way". But that's clearly nonsense, you have to consider the totality of the evidence. At some point there are too many coincidences to explain away.

What would convince me to change my mind? It depends on what exactly. I don't think that anything could convince me otherwise of the copyright violations, beyond somebody showing that the investigators lied, and that reported similarities don't exist at all. Of the copyright violations extending to the core evaluation routines? The missing source code would be the best proof, since it could show whether the similarities extended to things like ordering and naming of functions. Of Rybka being unfairly singled out, since everybody was doing exactly the same copying from Fruit? Somebody would need to show that this really happened, and that the other rules violators are getting a free pass.

dalke · on Jan 9, 2012

Your selection of options (a), (b), (c), and (d) excludes the possibility of a mixture of (b), (c), and (d)?

The Minix/Linux example seems like a good parallel. Linus ran (and didn't like) Minix on his hardware. I know Minix had a textbook which described the design and implementation details, which is how many people in the early 1990s learned kernel development. I wouldn't be surprised if some of that design thinking affected the Linux development, and if you wanted to look for apparent violations, I'm sure you could find some equally suspicious parts, like where the ordering of arbitrary operations was very similar to the Minix version.

However, as the SCO-Linux controversies page on Wikipedia states, "In order for copyright to be violated, several conditions must be met. .. Second, all or a significant part of the source must be present in the infringing material. There must be enough similarity to show direct copying of material." I mention this because SCO repeatedly insisted that there was copyright violation, with examples, but which upon a more complete analysis proved groundless.

So far, in the limited study of the evidence I've done, it does not at all appear that there is "a significant part of the source" in Rybka to justify a claim of copyright violation.

In the SCO dealings you do see that there is a "group of jealous sub-peers" working together against Linux. I refer here to Microsoft's funding of SCO, which was "widely seen in the press as a boost to SCO's finances which would help SCO with its lawsuit against IBM." Why would the chess programmer community be any different?

Similarly, in the two analyses I've reviewed, both are flawed. While one came directly from the chessbase article and hence has bias error, the parsing example clearly shows that 1) there is no copyright violation for that code, and 2) the analysis report emphasized one minor point of the code but justifying its presumed importance, and omitted obvious mitigating factors.

That is, why would someone change the strstr function arguments, the number of parameters to the fen parse function, and the ordering of the fen parse calls, but leave the s[-1]=0 in the code? (For purposes of copyright law, these are all creative endeavors.) Does the s[-1]=0 technique occur elsewhere in Rybka? It occurs twice in the Fruit code; does Rybka also use it in that spot, and only that spot? Without that test, which double checks that the analysis method is itself valid, the conclusion must be suspect.

The point of the analysis is that there is deep algorithm similarity between the two codes. I'm willing to say that there is. But for reasons I listed before, it is not and cannot be plagiarism any more than how West Side Story plagiarizes Romeo and Juliet.

From what I've seen so far, there is absolutely no basis for a GPL copyright violation claim.

> I don't think that anything could convince me otherwise of the copyright violations, beyond somebody showing that the investigators lied, and that reported similarities don't exist at all

Are you convinced that that parser code which I highlighted, and which you said is a "verbatim code copying" is not a copyright violation? If so, did the investigators lie by omission?

dalke · on Jan 9, 2012

tl;dr - the disassembly for the UCI position/fen parser in the Watkins PDF is clearly NOT a verbatim copy from Fruit. The Watkins disassembly from Rybka does not and cannot match the Fruit source code. I know you don't have much time, but please tell me where I made a mistake in this conclusion!

(BTW, I'm taking it as granted that we are talking about copyright issues. I accept that it violates the uniqueness requirement.)

I looked again at the parsing code analysis described in http://www.chessvibes.com/plaatjes/rybkaevidence/RYBKA_FRUIT... . Did you realize that the two codes are different? Fruit does a strstr for "fen " and "moves " while Rybka does a search for "fen" and "moves". Note the difference in the trailing space. Is that a typo in the analysis? Yet there is such furor over the use of a "0." when it should be a "0". I'll assume that the analysis report (and associated disassembly) contain typos.

Note then the different call order between those two code snippets. Fruit calls "board_from_fen" only if fen != NULL, while in the disassembled Rybka code, it always calls the labeled "board_from_fen(), for startpos", and if v1 and v3 are not NULL, it calls the same function again.

Here's the quoted assembly code, as provided by Watkins:

    0x406958: callq  0x40cc40             # strstr call for "fen"
    0x40695d: lea    0x25d798(%rip),%rdx  # 0x6640fc "moves"
    0x406964: mov    %rbx,%rcx
    0x406967: mov    %rax,%rdi            # rdi has fen strstr ptr
    0x40696a: callq  0x40cc40             # strstr call for "moves"
    0x40696f: lea    0x25d74a(%rip),%rcx  # 0x6640c0 "rnbqkbnr..."
    0x406976: mov    %rax,%rbx            # rbx has moves strstr ptr
    0x406979: callq  0x402980             # board_from_fen("rnbqkbnr...")
    0x40697e: test   %rdi,%rdi            # if fen != NULL
    0x406981: je     0x406998* [0x406995] 
    0x406983: test   %rbx,%rbx              # if moves != NULL
    0x406986: je     0x40698c
    0x406988: movb   $0x0,-0x1(%rbx)          # moves[-1] = 0

The "0x406979: callq 0x402980" is always called, and it doesn't correspond to any code from Fruit.

This is important because in the Fruit code, board_from_fen is passed "fen+4", so fen cannot be NULL, which means "callq 0x402980" cannot be the same as "board_from_fen". This means the execution order difference in the above assembly code cannot be the result of a compiler optimization.

Which means this assembly code cannot be generated from a verbatim copy of the Fruit source code!

Not only that, but while my assembly isn't so good, it looks like only _one_ argument passed to the Rybka equivalent of board_from_fen, which takes _two_ parameters in Fruit. I'm guessing that the input board data structure in Rybka is a global.

Seriously? You count this as evidence for a copyright violation?

The author of the PDF calls that NUL insertion "odd". I couldn't understand why. It's something I've done in my own code. It's a very handy sanity check because it ensures there can be no buffer overrun. The PDF author goes on "In any case, the fact that “something is done” here that (in the end) serves no purpose makes this a mentionable commonality."

However, there's no indication of why there's no purpose to it.

So I downloaded the 2.1 code. I can tell the there's no purpose for it in Fruit now. There's no evidence, however, that it wasn't useful while bootstrapping the parser. There's no evidence for why it's not needed in Rybka.

Looking at the code, I realize that the PDF author omitted important code in the analysis. Here is the actual code from Fruit 2.1:

   if (fen != NULL) { // "fen" present

      if (moves != NULL) { // "moves" present
         ASSERT(moves>fen);
         moves[-1] = '\0'; // dirty, but so is UCI
      }

      board_from_fen(SearchInput->board,fen+4); // CHANGE ME

   } else {

      // HACK: assumes startpos

      board_from_fen(SearchInput->board,StartFen);
   }

Notice the "HACK" section? It was left out of the analysis. This defines the "startpos" case if the fen string isn't present.

The equivalent code in the Rybka code could be easily written as:

    parse_fen(startpos_board);  // Make sure we always have a valid board
    if (fen && moves) {
        moves[-1] = 0; // Ensure no overrun 
        parse_fen(fen);
    }

Notice how it always parses the startpos, and then only if the fen is available does it parse the fen? This is different code. The order of operations is different, the function call parameters are different.

Here too I used a different coding style, so the code doesn't even look the same, even though the final assembly code will look very similar.

In other words, the command parsing does NOT "exactly [match] that of certain parts of Fruit", even though you say it does, because in Rybka:

  - the strstr search strings don't end with a space
  - the "startpos" board is always parsed, instead only when there is no fen string
  - the parse function appears to take one parameter instead of two, and use a global data structure

Please explain this discrepancy. How could that assembly come from the Fruit source code?

anatoly · on Jan 8, 2012

You were able to change my mind on the issue with your well-written and detailed comment (I also had no knowledge to go on besides the chessbase.com article). So please don't think it was a waste.

waqf · on Jan 7, 2012

Which did you see as the conclusive evidence that code was copy-pasted? Section 4 of the article made the point that the Rybka source was missing and the accusers' evidence was decompiled code which they had arranged to look as similar as possible.

waqf · on Jan 7, 2012

Ed Schröder, the initial accuser who recanted, states in summary:

"4. It is not possible to state categorically from the available data whether:

" 4a. Vas took Fruit source via cut and paste and converted to Rybka;

" 4b. Vas kept a copy of Fruit source open on another screen while he wrote Rybka himself;

" 4c. Vas absorbed ideas from Fruit (and other open sources) and then coded up Rybka hmself."

Source: http://www.top-5000.nl/evidence.htm — worth reading, but just as open to accusations of bias as the other links we've seen.

Someone · on Jan 7, 2012

Yes, section 4 says that, but I do not find that section convincing. IMO, some of its arguments are flawed. For example:

"integers are quicker than floating-point numbers"

Maybe in this case, but not in general, on all current hardware. Chances are that chess programs use little floating point code. If so, using a few floats here and there can even be faster because it allows the CPU to issue more instructions in parallel.

"According to Rajlich, he wrote a utility program (separate from Rybka and not available to users) in the C# language to generate his PSTs. As Fruit is written in C (not C#) this means there is a 100% certainty that Rajlich did not copy the Fruit PST generation code."

For low-level chess code (a bit of bit-twiddling, array lookups, arithmetic, but no library calls), I would think that porting C to C# can be an almost 100% copy-paste job.

Remarks like these make it hard for me to accept the conclusions as undeniably true. The "but everybody is doing it" argument, on the other hand, I found more convincing.

wisty · on Jan 7, 2012

The thing I'd like to see - a three way comparison of the alleged "cheating". Rybka and Fruit have some similar parts, but how about Fruit and any other similar program? It might just be that any chess program will have parts that look similar.

jsnell · on Jan 7, 2012

A document like that was included in the original investigation, with 9 different chess engines. It didn't compare source code though, but on which features of the board each engine would consider when evaluating a position. For example one engine might give an extra boost to a position where your knight is in a central location and capable of affecting a larger part of the board, another might give a boost to situations with interlocking pawn defenses, etc.

According to the evidence, two engines with different ancestry would have a feature overlap of about 30-35%, up to 45% in the most extreme cases. With the exception of Fruit and Rybka, with 75% overlap.