Hacker News new | past | comments | ask | show | jobs | submit login

The whole "downloading history of the repository onto your machine" thing about git is what makes it unworkable where I work. A normal checkout from SVN is over 3GB in size just for our team's tree. There are a number of binaries that get pulled in and updated for various reasons (SDKs, platform-specific binaries) and they are versioned for repeatable and automatable builds across branches, all self-contained with minimal dependencies. I dread to think what the entire history would take - it must be many 100s of GBs at least. It would certainly rule out the whole "working disconnected" idea on laptops, for one.



If you wanted to use git in this situation, rather than svn, I'd recommend using git-annex (http://git-annex.branchable.com/). It avoids those binaries bloating the history while still letting branches "contain" specific versions of them. You can set up a centralized annex that is eg, a bup repository (or a rsync server, or use S3) and git-annex pulls down the binaries from there on request.


What is the disk footprint of the existing SVN repository? "100s of GBs"? git's worst case repository size is equal to or slighly larger (few % probably) compared to an SVN repository.

Git will deduplicate all of your binaries across branches (and if you are clever, across repositories, but that's another story) so worst case you will only have one copy of any binary file no matter how many times it appears in your history.

That is not to say that git may be a poor fit for your current project+organization for other reasons but a blanket assumption of ("repository would be too huge") is not generally accurate.


He didn't say the repo would be too huge for the server where the svn repo lives today. He said it would be too huge for the laptops.

The binaries are already deduped, that's why they live in svn.


The OP made no mention of how large the SVN repo is, but rather speculated that the git equivalent would be "100s of GBs".

Had the statement been: "Our SVN repo is alreay 100s of GBs in size" then yes you are not likely to want to stuff that onto a laptop but that was not the claim. The claim (without rationale) was that the git equivalent would be 100s of GBs which is similar but not at all the same.

one assumse 1:1ish correspondence the other applies an unstated multiplier SVN * n = git where n > 1 (and significantly so.)

In reality the multiplier appears to be negative in many cases:

> Git's repositories are much smaller than Subversions(sic) (for the Mozilla project, 30x smaller) [1]

[1] https://git.wiki.kernel.org/index.php/GitSvnComparison


He speculated that the entire history (which I would say is equivalent to the SVN repo) is 100s of GBs. We are assuming a 1:1 svn repo to git repo ratio.

Mozilla's ratio would be relevant if they were storing something like the visual studio installer in their repo. They aren't, so it's not.


The binaries (including debug symbols etc.) are the bulk of that size, and storing all revisions locally will almost certainly add up to non-trivial size for a laptop. Unless git has some kind of magic differencing algorithm specifically executable code and debug symbols, I don't really see a way it could work - that's my rationale.

Of course, such algorithms do exist - Google's Courgette - but I don't think git is using them (I have looked) and doubt they are tuned to e.g. Borland TDS/RSM/etc. symbols.

I have no idea how large the svn repository is - it's stored on a SAN and run on a dedicated server I only interact with via svn. It could be many terabytes for all I know; and of course, my team's project tree isn't the only thing in the full repository.


Can you maybe separate the binaries and use git for only sources?


You're being downvoted for not using an HN approved workflow. Your workflow is inconceivable and therefore wrong.


Maybe I'm just grouchy today, but what's inconceivable to me is why you'd write an Oh No The Hivemind comment on this topic of all things. Some workflows actually are suboptimal.


The people complaining about this particular workflow being suboptimal do not understand it, nor why it exists in the places it does. Sometimes when something looks stupid, it's because it is stupid. But sometimes it's because you don't understand what you're looking at.


But have you actually tried it? You might be surprised.


I might give it a bit of a go when I'm on-site in CA in a couple of weeks, rather than trying it over transatlantic VPN - though my MBA only has 256GB SSD, and it's 50/50 Windows 7 and OS X.

Still optimistic? :)


Two words: git submodules.


Git actually starts compacting your object database and creating super-efficient delta-compressed packfiles after a bit. You can still throw object files in there afterwards though, and it doesn't change the basic principles of operation.


Super-efficient delta-compressed packfiles of zip files are not, in fact, super-efficient.


Ah ha, yes, you are correct - but packfiles combine multiple blobs together to benefit from additional compression that you couldn't achieve by compressing each individually.


Git repositories are quite a bit smaller than svn's[1], but hundreds of GB is pretty huge.

[1] http://www.contextualdevelopment.com/logbook/git/large-proje...


git-clone --depth 1


Shallow clones are essentially read-only, they can't push nor be fetched from. Unless you're prepared to regress to emailing patches around, a shallow git clone is actually less useful than a svn checkout.


Shallow clones don't seem to save much space: http://blogs.gnome.org/simos/2009/04/18/git-clones-vs-shallo...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: