> Also, instead of forking it on GitHub, he just reuploaded the whole thing again
He responded in his comments:
> If not, the fork will remain buried as a Pull Request with no visibility at all.
Which is loads of bullshit. By design git is decentralized so upstream is a mere convention, where usually the closest to upstream is either the original author's repository or the most active repository.
I was actually writing a blog post on this issue as we speak. GitHub's approach is broken to how they handle forks. It's a long post but I updated it to mention your comment here.
A fork is a mere 'git clone'. The brilliance and simplicity of SHA1s makes linking repos as trivial as a git remote add. I venture it would be quite costly to actively look at every commit of a new repo to match it with every other commit in every other github repo (unless they start maintaining a global index of all SHA1s->repos). So for now Github merely records that one cloned a repo, and I seriously won't blame them for that.
If the guy did things properly, although inconvenient we could just do that to merge back, but apparently what he did was rewrite or recreate history in some way, so that SHA1s changed so it's not even possible to add remote[0]. I guess that even with no common ancestor SHA1s one could still use rebase --onto to replay some part of his history.
Either way I can't tell because the repo is 404 now and I did not have a chance to browse it, let alone clone it.
PS: I like some of your ideas, like hiding forks that can be fast-forwarded (i.e they have no changes i.e their HEADs are members of the parent repo), and detecting that one has pushed a repo that is obviously a clone of another, although I'd rather have it be manual (like a message on the github's console saying "we detected that this repo could be a clone of that repo, do you want to mark it as a fork?")
He responded in his comments:
> If not, the fork will remain buried as a Pull Request with no visibility at all.
Which is loads of bullshit. By design git is decentralized so upstream is a mere convention, where usually the closest to upstream is either the original author's repository or the most active repository.