Hacker News new | past | comments | ask | show | jobs | submit login
GitHub forking has one big flaw (zbowling.github.com)
207 points by bpierre on Nov 26, 2011 | hide | past | favorite | 47 comments



  Forks are almost to easy to create. Forks get created 
  constantly and go no where.

  ...

  I would love if GitHub supported a model where if I 
  forked a repo at a version and made no changes, it 
  treated it like a private repo. It shouldn’t be visible 
  to anyone except me (unless someone hits the url 
  directly) until I push my first commit that is different 
  than the upstream. At that point it should flip to 
  public. This would clean up some of the fork soup we see 
  on pages.
My misgivings with the "root repo" aspect of this post aside, this suggestion is great; I'd really love to see this picked up by the GitHub guys.

(As mentioned in one of the post's comments, some popular projects have a lot of empty forks, which makes viewing one of their Network screens an absolute nightmare to browse when looking for active forks.)


Exactly. Branches do this already (they hide if they can be fast-forwarded). If you can fast-forward merge a fork into the local one, it should be hidden when I viewing the network page, just like local branches.

The network graph they render already does this to some extent.


What's with people declaring things they don't like, or could be improved as "broken"? GitHub's forking is not "horribly broken". I like the ideas in this article, but they are just that: ideas.


Maybe github is not "horribly broken", but there are a lot of broken projects on github with the problem he describes.


Hyperbole pulls clicks.


"Not all forks are equal and no one repo is necessarily any more important than any other (including the original repo)."

I think the argument can be made that this is exactly why git is cool. Which repository is considered the "master" repository becomes a "social decision" instead of a technical one (by means of the admin rights). However, GitHub emphasizes the role of the "original" repository by mentioning it everywhere (as pointed out by the author).

In my opinion, it is a fair point to argue that adding a description to forks would be useful. But to say that the GitHub model is fundamentally flawed takes is a step too far for me.

The article also mentions that it would be great to have more options regarding pull requests. This is indeed something that I would also find useful. Maybe there could be the "standard" pull request, but optionally the user could propose a fullow-up action on the pull request!?


> I think the argument can be made that this is exactly why git is cool. Which repository is considered the "master" repository becomes a "social decision" instead of a technical one (by means of the admin rights). However, GitHub emphasizes the role of the "original" repository by mentioning it everywhere (as pointed out by the author).

That's TFA's whole point...


I most definitely want to see the parent repository. It is extremely valuable. Especially when I come across a repository that hasn't been updated in a couple years and by clicking on the parent I can see that the fork was just there for some short-lived reason.

I agree that sometimes I find it frustrating why a fork exists, but usually just looking at it in the network view is more than enough and is exactly what makes git/github so amazing.


But what if it's the parent repository that hasn't been updated for a couple of years? I've run across this scenario many times.


Consider before github's neato fork network view..... Oss projects still had branches and forks and experiments all over the world privately and publicly.... Github just makes it easier, and keeps improving.


You can use repository labels on Github. Here is a "root" repository with a note saying to use another one:

https://github.com/rails/dynamic_form

https://img.skitch.com/20111126-daej32y71sjau9sm3a5t356gac.p...


That's part of the solution, but labels aren't ubiquitous: https://github.com/rails/dynamic_form/network/members


I'm impressed that this blogpost wasn't titled "GitHub has one big forking flaw".

That aside, I think github's model is largely correct. In git itself, a commit is a child of a parent (or parents).


Rarely has one blog post complained for so long about something that matters so little.

He isn't arguing there is an actual flaw, he is arguing that the feature isn't identical to what he hypothetically wants He doesn't give practical problems with it, just hypothetical problems.

For example, he says it's hard to tell 'the importance' of a fork. The importance of a fork is whether or not it is the original project, or a fork. If it's the original project, it's what you were looking for. If not, it's not, unless you were looking for a modification or feature missing in the original. In the latter case, you don't need to see the 'importance', whatever that means, you need to see the diff. Or have someone recommending it.

Anyway, the post just rambles for pages.

Speaking of flaws, HN has one big flaw, which is not showing sub-domains (and thus tricking me into clicking this link, thinking it was an official github thing).


I'm sorry, but I don't believe I was rambling. There is a flaw in the approach that they are taking. I made this clear several times by pointing out Linus's talk and the background of forking on Git.

When I make an argument, I try to be thorough and show evidence, then propose a solution rather than just complaining.

I agree with you that the project is important if it's important. That is exactly what I want. Let the community decide. I hate GitHub implying that my fork is any more or less important unilaterally because I was just the original or not. I document evidence that people follow that link even when the parent is a dead project I put in place.

This one issue has really been festering at me for the last couple years on GitHub. As the number of projects grow, more projects are starting to bitrot and the problem is only getting bigger.


I could not agree more.

When I find a new cool library or program on github I often have trouble finding the "official" or "best" version. Many times it is of course the root repository of all forks, but often the original developer moved on and someone else picked up his work. If there are only 2 or 3 forks I can check them all, but more is impractical.

When I put stuff on github myself I often lose interest in the software, but someone was nice enough to fork the project and keep working on it. Unfortunately my project will still always be displayed as the most important version.

Some software developers have decided to create a shared account (or, since Github now supports this too, an "Organization") to prevent this problem (see for example github.com/ruby/). When one developer decides to leave the Ruby developement team, he is simply removed from the shared account/organization, but the project URL stays the same.


Look at the Network activity. It shows you the level and dates of commits of all the forked repositories.


This doesn't always help as much as you would think it should. Often times I've seen a fork with a lot of recent commits, but it's actually a bunch of pseudo-private changes against an old version.


I find GitPop to be really useful for finding the most popular fork. It shows you the number of watchers, forks and issues for each fork, which makes spotting the "best" one a lot easier. It even has a handy bookmarklet!

http://gitpop.heroku.com/


In cases where you've moved on and someone else is taking over as a fork, can't you just delete your repo and have the entire hierarchy just shift? I'm honestly asking because I've deleted a ton of repos but none that have been forked so I'm curious.


I didn't consider it rambling. Honestly, I feel it's a matter of taste rather than a flaw and I happen to disagree with you. I can't tell you you're right but I also can't say you're wrong. I think Bitabucket and GitHub just have different perspectives on it. GitHub and I seem to agree on the forking issue but that's a coincidence.

You did try to talk it out and give some concrete examples. I've seen big whine-athons before and your post does not qualify as that as far as I'm concerned.

I think there's the possibility that project owners may start to delete forks and create entirely new repos when the bit rot issue starts to affect the repo they originally forked from. This has it's own set of problems and I might be a little too optimistic but I happen to think there's a chance we'll take matters into our own hands on that issue. Also, I would venture to guess that abandoned projects probably weren't so great to begin with and maybe the bitrot issue won't affect them as much? Like node.js isn't likely to go away and if it or a popular project like it is abandoned the authors would probably announce it and add it to the readme and so everyone will know to start looking at forks over the original.

Anyway, the HN community obviously thought this was important or good enough to be on the front page so you must have done something right. I'm sure if you were really rambling you'd just be lost in the "new" section. I don't always agree with what gets on the front page and I didn't agree with this being there at first either but now I'm starting to come around.


Thanks! Very kind words.

The one case I didn't link but talked about was my co-worker Michael's snipmate.vim project.

https://github.com/msanders/snipmate.vim

He stopped working on it because he felt it was dead end to keep supporting it since vim plugins are so hacky, but the community loves his plugin. It's has a massive number of forks and patches but he hasn't touched it since 2009.

This fork has taken up most of the new development on the project and really has pushed it hard to almost a 1.0 now: https://github.com/garbas/vim-snipmate


Git Pop looks like it can help you see garbas' repo. It trimmed the list of repos from 335 to 7.

http://gitpop.heroku.com/?url=https://github.com/msanders/sn...

Although I agree that something like this should be built into Github.


Couldn't he have given someone permission to take over the original repo via pull requests? I was working on a ruby web app with a guy from Italy and the project is dead but I gave him access to the repo and now he can commit and push all he likes until I get the time to work on it again. Would that work for your friend? Maybe he could appoint someone to take charge of that particular repo and keep the original project alive without giving him access to the whole account.

Edit: after reading some other comments I'd also like to suggest that creating an organization would help. Once the project is abandoned by the creator someone else can take over and the creator just drops off. Of course this only works if people put it into practice but it can possibly mitigate some of the problems you have.

If people would be thinking of these things and put these solutions into practice then your criticisms would be a little less necessary. But alas, you can't always rely on people. It really comes down to a choice the way I see it. The way things are and the way you wish they were both have merit. What if GitHub just supported viewing forks differently? It would be cool to see a list of forks without a master for some and a hierarchy for others depending on how you want to filter the page. That would be cool and do a bit of good.


Well, in my opinion it is rambling. You say there is one big problem, but then your 'problem' isn't clearly a problem.

You list all the reasons you fork code, and I'm not sure why you did that. If it's to support your assertion that "all forks being equal" is a problem, that is even more rambling, because you give 4 examples of things which are of no importance to anyone but you. And this is already partly below the fold! So, if by rambling I mean "Lengthy and confused or inconsequential", I'm justified.

To correct this, you must establish immediately after you claim that 'all forks being equal' is a problem that there exists at least 2 classes of forks that should not be treated equally. You should give an actual example, so that it is clear that this is not just a hypothetical problem that doesn't happen in practice.


Not all forks are equal, but GitHub shouldn't treat them differently regardless if they are they are the original fork or not.

The only exception is if the fork contains no changes of it's own and could be fast-forward merged into the fork you are viewing. (They do this already in the graph they render under the network view.)


I think GitHub should treat them differently. Maybe you and I are browsing projects that see completely opposite patterns of creation, forking, and upkeep. I said somewhere here before that the original is usually the best, most reliable, and useful to more people than most others. When I see a list of forks I do browse a few but always end up at the master and the master is usually the best one for me.

If I had to pick one day to do things I'd keep them as is. But having some sort of filtering mechanism where we can sort by hierarchy, activity, most recent commit, and others would be absolution that makes everyone happy. I wonder how feasible that is.

And again, I have to defend you against this rambling attack... This was a blog post, not a book submitted for publishing, not a thesis paper, not scientific research - it was a personal blog and you've got your own writing style like all of us do. You stated a position, gave examples, supported your arguments and I'd say that maybe you could have gotten to certain points sooner or maybe not. It doesn't matter. If some people can't be bothered to read a little bit then they can just move on to another post and not bother to go around insulting people who are just sharing a thought. Don't take shit from people, man.


HN does have sub-domain support if the link is posterous or a few other sites. I don't know why it is not enabled for all links though.


PG: Make the decision on what sites to show subdomains on at page render time, based on a flat file list of domains. Have a job run every now and then that checks submitted urls, and creates the flat file list of domains, including any domain with more than N different subdomains submitted, where N is high (maybe 400 different sub domains or something). If this file would be large, just build it based on the last M submitted urls (nobody will be browsing the older ones anyway, so if a domain comes and goes oh well).


wrt subdomains: they're actually in the page source, but not shown. This makes it trivial to enable them via JS. Here's an example using dotjs, but it'd be just as easy via Greasemonkey or otherwise:

https://github.com/sant0sk1/dotfiles/blob/master/js/news.yco...


Related to the article, although not directly to the main point ("Should GH bless the root of project forks"): So - I looked at the pull request that was linked to as 'couldn't get our fixes upstream'.

I don't know the project, but I was totally lost when I read the commit messages in the pull request. Samples:

  I fixed something 	
  small change'
  better idea
  class custer method
  much much better
  no such thing as an error here.
In the end upstream was lost as well, which might or might not be related to patch size and - erm - a little mess?

I know Zac is doing some pretty cool stuff, but this seems like dumping code, i.e. looks careless to me.


Yes, this is my biggest issue. I don't squash my commits like I should but this comes from our company repo and just gets imported. I tend to look at the last commit and just compare diffs for feature branch.


I agree that more efforts should be put into making forks purpose and activity more visible and comparable.

For me the GitHub network graph is invaluable when it comes to assessing forks activity and their potential usefulness. Very often my first action after coming across a fork is to navigate to the parent repository, but the very second one is clicking on "show me the fork graph".

The network graph, while still having it's flaws, is a tremendous advantage over Bitbucket and provides some overview on where currently the development is happening and what people are working on.


For what is worth, the "Before git and mercurial, merging was a nightmare in most monolithic source control systems" seems like a heavy misconception for me. How many people actually tried? The pypy project was using heavy branching and merging on svn and it worked just fine. The only "drawback" was that everyone had to have commit rights to the main repo. I say "drawback" because liberal policies to giving commit rights never ended up badly. Actually newcomers with commit rights are always more careful than old contributors.


I disagree with this post entirely. When looking at forks on Github its very easy to tell if someone has done anything meaningful by looking at the network graph and viewing commit messages (of course this is assuming they used sane messages in the first place)


entirely?


With regards to people forking a private repo, I'm still not not sure how to work with "collaborators".

I want them to fork and "pull request", but it seems they can just push back into my master at will.

Maybe I'm looking at it the wrong way.


That's what "collaborator" means in Github parlance. If you don't want people to have this permission, don't make them collaborators. If you want people to be collaborators but work mostly on their own fork and send pull requests, ask them.


From your description it sounds like you're talking about self-hosted repo's not Github. For that you have a couple of options, you can use a gatekeeper such as Gitosis and assign permissions. Or, simply have people pull through a read only mechanism (like git daemon) rather than giving them ssh or other writeable access.

Edit: My bad.. n/m


just add a note to the Readme pointing to the new fork that took over?


I thought the same thing. You can switch the "root" if you coordinate with the root repo owner and GitHub support. Often those projects are dead or the maintainer doesn't respond.


I disagree but think this is all a matter of personal preference. There's a choice to be made here. Usually the original project is updated more often and is more useful to the most people. That's generally speaking, of course. A lot of forks are for niche situations or for experimenting like the author says. I much prefer to know the hierarchy of a project and it's forks rather than just seeing a list of forks and not knowing which is the original. Plus, you can use repo labels on GitHub.

I forked HTML5 Boilerplate and created my own custom boilerplate plus CSS framework from it. While I want people to use it and think its valuable I must admit that my humble little project is not half as good as the original it was forked from. I'd want people to know that my fork is a descendant of the original and be informed before deciding to use mine. That's why I deleted my fork and just created a new repo (though I do make sure I let people know what the basis for it was).

Does that make sense? Would a lot of people agree this isn't a flaw but just a matter of personal preference? I've only been using version control (Git is the only system I've tried) for about 4 months or so, so maybe my view is flawed due to lack of experience or proficiency.

On a side note, I noticed a lot of grammatical errors that really really bugged me. I hate to nitpick but I can't help myself. I hope the author corrects his use of "then" instead of "than" and the incorrect usage of "very" where it should read "vary" especially. Not trying to be a dick, just wanna be helpful.


Sorry. I didn't expect this to be trending on HackerNews or I would of done another pass over it for grammar.

Edit: fixed https://github.com/zbowling/zbowling.github.com/commit/84ece...


"Would have". :)


I'm just going to say that I was trolling instead of admit that I made another mistake.


"admitting" :).


This is git. The point is that it's decentralised.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: