Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Which revision control system?
42 points by cconstantine on Nov 24, 2008 | hide | past | favorite | 61 comments
For the past few years we've been using Perforce at the company I work for. For a number of reasons, we've been given the OK from management to switch to a free SCM system. This is supposed to be a big opportunity to improve our process and save us money, but we're having problems finding a good replacement.

Our must-have features include; ability to import our perforce history, easy/trivial branching and merging (preferably the ability to use an external merge tool), large file support, ability to pass changes from one workstation to another without affecting everyone, and ability to work with multiple gigs of binary files (most are roughly 500k, some are 100+megs). The binary files are mostly build artifacts, but for a large number of very good reasons we can't expect developers to generate them. Each developer having their own branch would be nice, but isn't critical.

Our products only run and compile on Windows, so decent windows support would be nice. We aren't exactly an MS house; our product works with the artifacts of programs that only run on Windows so it runs on Windows. Our products are very-much closed source, so things like Github are of no use to us.

Distributed systems like Git are compelling for the workstation to workstation support. Unfortunately Git chokes on our binary files (I get an out of memory error), and I'm afraid that the other DSCM projects will stagnate in favor of Git. Centralized systems like Subversions are in line with the way we've always done business, but they lack the ability to view/review/pull changes on another workstation and branching/merging isn't as easy.

I'm a big fan of Git, but if it can't work with large file it won't work for us. Subversion seems like a step backwards, but I could be wrong. Whatever we choose, we'll be with it for a long time and use it every day.

So, HN; what revision control system would you suggest?




Out of the three leading free DSCMs -- Git, Mercurial, Bazaar -- none seem in danger of going extinct any time soon. Bazaar is the foundation for Launchpad and Ubuntu, almost irrevocably. Mercurial has less buzz than Git for open-source projects, but I suspect it's even more popular in the business world, and its Windows support is solid. Mozilla uses it, anyway. So you might want to give the other distributed systems, particularly Mercurial, another look.


Git isn't very good for large files, unfortunately. In my experience large is quite over >100MB, though.

I'd love to convince everyone to switch to git, but I want to note that SVN's branching/merging capabilities are at par with p4's, since at least 1.4 using svnmerge and probably in 1.5 standalone. (My current employer uses p4, and I switched my last workplace over to SVN from CVS.)

It does suck to go back to centralized SCM after using a DVCS though (I clone my work p4 checkout into git for minute-to-minute use). A possible solution: if your build artifacts are most of the large-file problem, consider storing them out of band and having workstations fetch them using Ant/Ivy or something similar?


I would love to have the build artifacts stored in a system for managing build artifacts, but I don't really know of any.

We could (theoretically) store the build artifacts in Subversion and the source in Git, but that would be very unpopular. No one in the company is interested in having multiple revision control systems to maintain.

Our main product is built in Visual Studio, but we use ant for the surrounding build tasks. I think of ant as being a 'better make', what does it have to do with grabbing remotely stored build artifacts?


Ivy is an addon of sorts for ant that does dependency management (e.g. libraries, etc.) It sort of jacks the Maven dependency resolution bits and leaves the rest. It's definitely not a 'system for managing build artifacts' (and it's pretty java-focused) but you can set up your own repository and have a build target fetch them, put them in the right place, etc.

I wouldn't necessarily advocate this since you'd inevitably end up having to build some system around it, but conceptually it seems like what you'd want. YMMV, I've only used it for pretty straightfoward Java stuff.


None of the distributed VCS handle large files well because they load the whole file into memory when they diff. Only Subversion seems to do well currently.


By "handle large files" all I really want is for it to think of it as a blob that can't be diffed and store it.


Git handles symlinks, so if your devs were all on UNIX-like systems, you could just store symlinks to your un-diff-able blobs under your Git repository.

Sorry to offer a non-solution, but maybe someone else out there does have a UNIX dev platform, and could get some benefit from storing big binary files in network share with versioned symlinks.


I hadn't thought of that. You'd then be revision-ing the link, and not the file itself.

Unfortunately we are tied pretty tightly to Windows dev. environments.


Branching/merging on Subversion has been absolutely trivial since the version 1.5 release. Prior to that it was a nightmare, but now it's ridiculously easy.

Can we finally put this to rest now?


Agreed. Though, one caveat: SVN will slow down as it diffes large binary files. I remember reading a developer works article file about this. (http://www.ibm.com/developerworks/java/library/j-svnbins.htm... thanks Google!)

(PS: if I'm misinformed, or out of date, I apologize. I use SVN, but never upgrade and have PSDs in the ~100-200MB range and its not terribly fast.)


I really love Bazaar.

I love that it's distributed and I can version a local folder with a single command. I love the simple integration with Launchpad.net (which is annoyingly mediocre). I love that it's a single command which has all the inline help built in. I love the conflict management.

However, people with a much better understanding of the details wrote comparisons with each of the popular DVCS currently in use. I suggest if you use another VCS you at least read the Bzr side (http://bazaar-vcs.org/BzrWhy).

Specific comparisons:

Subversion: http://bazaar-vcs.org/BzrVsSvn Git: http://bazaar-vcs.org/BzrVsGit Mercurial: http://bazaar-vcs.org/BzrVsHg


Their bzr v. Mercurial comparison is a bit disingenuous at points, if not outright misleading.

Firstly, having used bzr 1.8 and 1.9, and Mercurial 1.0 and 1.1, the claim that bzr's speed is close to Mercurial's is downright hilarious. There simply is no comparison. bzr's speed is so atrocious that, when Python was looking at using bzr for its version control system, the only technique that bzr's fanboys could come up with to ensure fast checkout was to have you download a tarball of the pre-checked-out sources. You can see more at http://www.python.org/dev/bazaar/ . Conversely, I routinely work with Python-sized repositories in Mercurial without incident. If you have one take-away, this should be it.

They also claim that bzr can swap out its backend and hg can't (patently false--the backend has in fact changed for 1.1, due out very soon; see http://www.selenic.com/mercurial/wiki/index.cgi/fncacheRepoF...); that Mercurial cannot be served over vanilla HTTP (it can); that Mercurial does not let you change your merge algorithm easily (it does, see http://www.selenic.com/mercurial/wiki/index.cgi/MergeToolCon...); that Launchpad, which even you admit is mediocre, is Mercurial-only, without noting that Mercurial has http://bitbucket.org and similar sites; and so on.

There actually are a few advantages of bzr over hg. That site just happens to make most of them up.


That's pretty bad. Is it out of date or just overly fanboy-esque? It would be cool to put some feature list comparisons somewhere.

Right now most of this stuff seems really anecdotal because people that understand one system mostly use that system exclusively.


I honestly think most of it's just ignorance. I don't honestly think that anyone who has used Mercurial in anger can seriously be unaware of merge customization, or of the fact that Mercurial has simple revision numbers in addition to hex codes, or that Mercurial can clone from HTTP, but I can definitely see missing those features if you just messed around with it for an hour when you were trying to evaluate the two systems. Likewise, even someone who had used Mercurial fairly heavily could easily be unaware of Launchpad-like sites, or new back-ends in upcoming Mercurial versions and the like. At any rate, I don't think that the bzr team is being malicious; just uninformed.

The one point that I do think they're guilty of intentionally bending the truth are their claims of speed. bzr goes to great lengths to have more thorough history and merge tracking than either Mercurial or git. It pays for that by being slower. That's a perfectly viable trade in some cases, but bzr needs to own up to it. Instead, rather than simply admit that bzr is slower, its supporters generally try to explain ways to get around the fact bzr is slow, and then claim that the slowness doesn't matter. For example, initial clones ("bzr branch") with bzr are simply atrocious. A bzr supporter will tell you that you don't do clones that often, and besides, there are work arounds--e.g., since bzr works with whole-file hashes like git, it can share file objects and revisions across all of the repositories on your system, making branching repositories you've already brached at least once go more quickly, and for ones you've not yet, you can just download a tarball of an existing clone and use that as the base. These arguments are weak, sidestepping the issue. I'm reminded of git in its early days, when you had to manually run very, very slow gcs every once and awhile, and its supporters shot back with, "Well, yeah, but you can totally just stick that in a cron job." Mercurial's not innocent, either; I remember its dev team trying to argue why it didn't need named branches, when everyone using git (including me!) thought that git's branching was one of its killer features. bzr needs to do what those projects did: quit explaining away the bug, and just fix it.

bzr's merge algorithm is better than either Mercurial or git's, its GUI (QBzr) is best-of-breed, and its online operational mode makes it a drop-in replacement for Subversion for sites that are trying to migrate away from old habits slowly. It also happens to be significantly slower than the competition, to the point that, much as I personally believe that Mercurial has a better power-to-usability ratio than git, I think bzr has a bad power-to-speed ratio compared to either git or Mercurial. Whether you agree with my assessment is up to you. I just wish that their comparison pages were more honest about what the trade-offs are.


Yeah, I saw those pages and started reading through them, Unfortunately it's not exactly an unbiased source, and the bias leaks through pretty heavily.

I'm not discounting Bazaar, I just don't entirely trust their comparison with other system.s


I use Mercurial, but have been exploring Git a little because of what you noted -- the community momentum seems to be headed that way.


Another vote for Mercurial. Whenever a project gets that "KoolAid" feel (Git, Rails, even Haskell at times) I tend to gravitate towards its competitors, so take that with a grain of salt, but I can say that Mercurial's been great for the small teams I've used it with. On larger projects, I've unfortunately only used Subversion, but I can say that it works, at least.


I'll third the recommendation for Mercurial. There is no question that git has more community momentum--something which I hope will begin to change--but Mercurial is nevertheless an outstanding distributed version control system.

Mercurial's default mode of operation has the benefit of being extremely Subversion-like--in a good way. Indeed, if you never interact with anyone else, most of the normal Subversion commands--ci, mv, rm, add, log--"just work." There is no git-style index to worry about, no rebasing, and the command set is small and regular. That means that the learning curve is far, far shallower than git. Yet it still provides the same distributed branching and merging as git, and, though it is slightly slower, the difference is truly negligible in my experience--maybe a couple percent difference at most.

What about the incredible power of git, though? What if you actually want to be rebasing and rewriting your history all the time? When you want unfettered power in Mercurial, you've still got it. The `mq` extension, which can be enabled by adding a single line to your configuration file, allows you to do all the crazy patch rewriting, merging, splitting, and rebasing that git does.[1] But you can ignore that functionality if you want, and still have a very powerful and fast distributed change system.

When Fog Creek looked at going to a distributed source control system last year, I advocated Mercurial over the competitors. Though the transition wasn't seamless, I've been extremely happy with the result. The Unix experience is extremely pleasant, and TortoiseHg (http://tortoisehg.sourceforge.net/) provides a surprisingly solid Windows experience out-of-the-box.

If you can look past the fanboyism, I'd strongly encourage you to give Mercurial a try. I think it strikes a much better power-vs.-usability balance than git does.

[1] Except microbranching. Mercurial doesn't currently support local named branches. You can achieve similar things by using mq with qguards, if you really need them, but in practice, I find it's usually easier to just clone a second repository. In practice, although I do miss microbranches sometimes, I've found I greatly prefer the streamlined workflow of Mercurial.


Oh, right, I forgot. TortioseHg is a clincher if you want distributed VCS for Windows users. It makes things very easy.


The momentum here seems to be pretty heavily for Mercurial... maybe I should consider git to support the underdog ;)

All kidding aside, Hg might be what we need. I'm looking into it now. Is it easy to import p4 history into Hg?


Don't take our advice like that! Try it first: You may hate the way that one VCS works compared to another. Technical merits have less to do with this choice than your own preferences.


Hehe, yeah I'm trying to try it.

Unfortunately it doesn't like any of the large files (anything over 10megs), and the windows shell plugin causes explorer to run like a dog.

I've done some research on it and I want to like Hg. any advice on having it play nice with large files?


Maybe this is too far outside of the question, but particularly if you aren't going to be merging the large files, have you considered mirroring them with rsync (http://www.samba.org/rsync/) instead? I'm not sure tracking really large binary files is best handled by a VCS. I'm trying to read between the lines in your question, but would e.g. periodically making dated snapshots of the binaries and otherwise automatically mirroring around the newest version suffice?

The Mercurial page on binary files (http://www.selenic.com/mercurial/wiki/index.cgi/BinaryFiles) doesn't say anything about especially large ones.

Also, has anybody had good experience importing from p4 to hg on Windows? I've tried using tailor and some scripts from the mercurial wiki, but no success yet. One of these days I might write my own importer script (mostly because I need to import from five or six major branches, about 50k commits), but haven't had the time yet. (I'm working on Windows for similar reasons. Mercurial has been great for typical VC usage.)


Our import from p4 doesn't have to work in windows. We're comfortable in Linux, we just can't develop in it.

You are correct, the binary files will not be merged. Some of them are large encrypted databases and can't be merged. The database is built based on source that may be merged, but the merge will happen in the source, and when the source merge is complete we'd rebuild the databases and check in the result. The largest number of binary files are compiled programs and very rarely change.

You're suggesting something a lot of other people in this topic have suggested. I would love to implement some kind of binary file management system; it just isn't going to happen. These binary files don't need to be merged, but they will be changing semi-frequently and are likely to be different between branches. We don't have the time and manpower to implement a system that would work for us. Either the VCS needs to handle these files or we can't use it.

I think I found a config setting deep in the dark heart of git that can make a repository friendly to large binary files. I'll check it out, and if it works we could create a set of repositories for these binary files. I'll try this and see how it goes. ( http://www.gelato.unsw.edu.au/archives/git/0607/24058.html )

I'm really trying to find a way to use Hg, but if it can't handle our use-case we can't use it. That doesn't mean it's a bad VCS. I really like some of its features and the fact that it is far simpler than git. I also really like that it has file explorer integration. It just needs to handle projects with an obnoxiously large code base and large binary files.


Did you look at rsync? You don't really need to implement anything.

My point, though, was that in some sense you're trying to find a way to bend a VCS into doing something well outside the strengths of VCSs, so it is worth looking into categories of tools better suited to the problem.


Yes, I'm familiar with rsync. We've talked with Perforce Support a fair amount regarding large files, and they remind us that we are abusing their system. It's like using Harley Davidsons on a worksite to move around loads of dirt. It may work, but it's not the intended use.

We might be able to get that to work for some of the binary files realativly easily. The problem is that the majority of these files are build artifacts for our test programs and the builds happen in the same directory as the source (yes, this is a stupid way to do things, but we need to do what customers do, and our customers do this). It would be very hard to distinguish between build sources and build artifacts. This leaves the issue of branches. Each branch would have to 'know' which of the binaries to grab in the shared space, and update other branches at integration time.

This could be a valid way of doing things, but it would take a fair amount of effort because our environment is like the real world; dirty and complicated. For the past couple of years it's been on the backlog to clean and simplify our environment, but something more important always comes up.


Good analogy.

How do you distinguish between the build source and artifacts now? It's not difficult to specify (via filename regexes) whath should and shouldn't be examined by Mercurial as potential VC files, beyond whether or not you explicitly add them: look at the .hgignore file. (Not sure that's directly helpful, but for the archives.)


We don't distinguish between them. The test programs were built and all the resulting files were checked in. Unfortunately some of the build artifacts share extensions and directories with build source, and some of the build artifacts have no extension. The only way we could add the build artifacts to .hgignore (or .gitignore) would be to manually add them one at a time, and that would be a huge task.


You can also add * to the ignore file and explicitly specify what to track, either by hand or by "hg add [fname]" in some sort of script.

I track/sync my home directory with mercurial, and did that to keep it from scanning most of my drive for updates. (You can probably do the same with git.)

(For the archives as much as you, though I hope it's useful.)


Unfortunately, that's probably relatively complicated. The internal Mercurial convert utility doesn't speak Perforce, and while tailor (http://progetti.arstecnica.it/tailor) does, I've had mixed results with it. You can always give it a show. The worst case is simply that it doesn't work.


I tend to gravitate towards its competitor

Me too - which is one paths that led me to Bazaar. Like your experience with Mercurial, Bazaar has done the trick on the smallish (<10) teams I've used it with, and the bzr-svn plug-in means you can still go out to play with your SVN using friends...


It's not like Mercurial is all that unpopular. Sun has switched its open source projects to hg. Mozilla is using it as well. All the subversion committers I know prefer hg over git as well.


True popularity is different from hype. After all, Perl has always been more popular than Ruby.


I'm a big mercurial fan. After using it, I hate using SVN.


Our products are very-much closed source, so things like Github are of no use to us.

For what it's worth, GitHub offers paid repositories for just this case; open-source repositories are free, but you can pay to make your repositories private. Even the biggest plan is only $200/month; see http://github.com/plans for more information.


What makes you think we trust GitHub? ;) The only way GitHub could help us is if they were to opensource their site and let us an instance of GitHub in our datacenter.

We have some very strict rules about the code never leaving company computers, and those rules will not be changing.


gitorious is like a subset of github and is open source


Thanks! That looks like it might be something we can use :)


Personally, I use Git and hg at home and for small work projects.

But, for the main work projects (console games: lots of code, but also many large assets), none of the alternatives to p4 are reasonable. (Yes, I'm well aware Perforce has vast and sundry problems too).

Might be worth doing some testing on hg, if it can handle your data sizes reasonably, it meets all your other requirements I think.


A valuable resource comparing major scm's that I point people to when they ask me is the FreeBSD project's wiki page: http://wiki.freebsd.org/VersionControl Hope it helps. Regarding specifically SVN versus GIT, be sure to look at the VCSWhy link at the bottom.


Thanks, that was helpful :)


Do not go from Perforce to Subversion. Whatever you decide. That is a massive step backwards.


I wouldn't call it a step backwards. Perhaps a step to the side.


I'm actually a big fan of Perforce. I don't feel that any system I've tried (~dozen) really beats it when it comes to internal software development inside of an organization. For open source, distributed version control makes a lot more sense, but internally, Perforce is pretty hard to beat and modified versions of it are used at places like Google and Microsoft.

What reasons do you dislike Perforce? Maybe we can help you pick by comparing what you're trying to move away from.

If by "given the OK" you mean you need to swap because your company doesn't want to admin/pay for Perforce licenses, then maybe you can clarify that a bit, too.


Perforce is not terrible. I'd even go so far as to say; of the centralized systems, it's better than any other I've tried (including Subversion, and cvs). It could be better, but it could be much worse.

  Things we like about perforce:
   - Client side changelists for modified client side code.  This helps organize our local 
     changes to make sure when we're working on multiple things they stay separate.
   - The server is solid.
   - Can handle large files, including the ability to 'forget' previous revisions to save space
     on the server.  It's abusing the system, and we've been told as much by Perforce.  It just
     happens to be the way we do business, and changing that is another project.
   - It can do branching/merging
   - The visual diff and merge tools are pretty good (mostly, more below).
   - Everything that's under revision control is in one place.  If you don't want to have the 
     files on your workstation you can simply remove that tree from your view.

  Things we dislike about perforce:
   - It costs money.  At the size of our organization it costs about as much as another employee.  
     This is the reason we were 'given the OK'; developers don't really care about cost as long as 
     we're employed, and management doesn't really care about features as long as we're productive.
   - We've had significant issues when merging.  Conflicts are not properly flagged and "ghost" code 
     (code that didn't exist in either branch) sometimes appear in the merge result.
   - The clients are very iffy.  Crashes are frequent, and the merge problems are related to 
     client-side bugs.
   - No one in the company is a fan of being required to 'check out' files to get them to be writable. 
     This is how perforce knows when files are modified and because there is no equivalent for new 
     files people frequently forget to add files and break the build.
   - It can only show you < 1000 files in a given changelist.  Big changes like that are when it's 
     most important to see what you're doing.  This is pretty common when doing branch integrations.
     When this happens you have to hit the "auto-merge" and hope for the best.
   - Branching/integrating isn't streamlined enough to really support every developer having their 
     own branch.  If it was dead-simple we could support a distributed development model with a 
     centralized server.
   - No real way to share code (for reviews, or collaboration) without going through the shared 
     depot.  We use p4tar, but it doesn't play nice with cygwin and has it's own problems.
   - No direct way to revert code.  Reverting is a 5 step process that isn't entirely obvious.
I think that's it.


Excellent list there. I think all of your issues are all things I also see as drawbacks to Perforce. I think the main breaking point out of your list would be the merging instability you mention, which we haven't had at either of my last two companies.

We do have merging issues in a different way, but it's always because someone used the wrong flag when merging the two or didn't properly create the initial branch. Still, this is significant anyway because merging/branching really needs to be easy to manage (i.e. hard to screw up) for a tool like this.

In the mega integrations, we came up with a few tricks to work around the file size limit-- often involving merging subtrees one by one from the target branch. This is often needed organizationally for us anyway because the teams/experts are often different when it comes to resolving those merges. It results in more changelists, but it tends to work out okay and the integration history is actually in a better state regarding a proper contact for discussing it later.

We wrote our own Perforce tools at our company for sharing code for code reviews, or some teams use user/feature/"pre" branches for that. I'd like to see that improve in Perforce, but it's been pretty bearable.

What I really want is something with the maturity of Perforce in terms of tools/API/integration/monitoring/history/etc but is built on the premise of needing to do lots of merges and branches easily. I know some major organizations have gone with things like Mercurial because it felt to them like it would mature the fastest in terms of corporate needs, but I've yet to see any of the distributed version control systems that has gotten over the curve: http://en.wikipedia.org/wiki/Image:Gartner_Hype_Cycle.svg

I can't wait until they do, really, because the perspective that merging is central to version control is something I agree with.


I evaluated Mercurial some months ago and found it very easy to migrate to from Subversion.

Here are some benchmarks on the Linux source tree that might be useful:

http://laserjock.wordpress.com/2008/05/09/bzr-git-and-hg-per...

This blog post is pretty good at summarizing git and mercurial: http://importantshock.wordpress.com/2008/08/07/git-vs-mercur...


Why does Git choke on large files?


Linus Torvalds: "The git architecture simply sucks for big objects": http://kerneltrap.org/mailarchive/git/2006/2/8/200591/thread


I use darcs for personal use and it is very easy to use.


I'm sorry to say but it's starts to get messy when it's used by a team. Darcs suited me fine while worked alone with projects but as others joined we faces serious problems, including loss of data. I gave it up, switched to Mercurial and never ever missed darcs.


Even when used by yourself. Branch a project, build a complicated new feature (e.g. 50 patches), then try to merge.

The exponential merge problem really sucks.


I gave up on darcs awhile ago, but darcs 2 honestly does greatly reduce the merge issues that used to plague it. If you're still on darcs, get the upgrade. You may find you no longer need to move to a new product.


darcs has a very nice interface, but it does not handle binary files (many/large) well.


I should recommend either mercurial or git, but the learning curve of git is steep.


If you set up git's bash completions, and run through a git tutorial, you'll know enough git to work on your projects. The more esoteric subcommands are generally useful in specific situations and simply don't have equivalents in most other VCSes.


Would it perhaps be practical to pull the binary files out of SCM and build a small tool that figures out denpendencies and downloads the relevant files from the build server?

It sounds like that might ease your constraints.


Something like this might be exactly what we need, but we don't make money building distributed build artifact caching systems. If we don't make money doing it, we aren't doing it :(

"Is this good for the company?"


It's often said that developers need to understand the business, and bring forward a business-case, not a technical case.

So: how much money will lose by choosing the wrong SCM today because you're unwilling to change a process you even agree is sub-optimal?

Try to count the number of check-ins, branching, merging etc. the entire team does in a year, multiply by five, and then multiply by even a few seconds of lost time, annoyment and agony pr. event.

I would guess that in that perspective you can afford spending a few days seeing if you can change the build-artifact process to allow you a wider selection of SCMs.


I fully understand this argument, but during the last quarterly all-hands meeting our CEO made it abundantly clear that we aren't doing anything unless it directly brings in more revenue or directly reduces overhead. One of the major reasons for switching from p4 is to reduce overhead, if we have to build a new system for it's replacement to work for us we aren't doing a very good job of reducing overhead.


Darcs!

I mean mercurial!

No, git!

(Aww, these computer science problems are so hard!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: