The whole "downloading history of the repository onto your machine" thing about ...

joeyh · on Sept 7, 2011

If you wanted to use git in this situation, rather than svn, I'd recommend using git-annex (http://git-annex.branchable.com/). It avoids those binaries bloating the history while still letting branches "contain" specific versions of them. You can set up a centralized annex that is eg, a bup repository (or a rsync server, or use S3) and git-annex pulls down the binaries from there on request.

fr0sty · on Sept 7, 2011

What is the disk footprint of the existing SVN repository? "100s of GBs"? git's worst case repository size is equal to or slighly larger (few % probably) compared to an SVN repository.

Git will deduplicate all of your binaries across branches (and if you are clever, across repositories, but that's another story) so worst case you will only have one copy of any binary file no matter how many times it appears in your history.

That is not to say that git may be a poor fit for your current project+organization for other reasons but a blanket assumption of ("repository would be too huge") is not generally accurate.

tedunangst · on Sept 7, 2011

He didn't say the repo would be too huge for the server where the svn repo lives today. He said it would be too huge for the laptops.

The binaries are already deduped, that's why they live in svn.

fr0sty · on Sept 7, 2011

The OP made no mention of how large the SVN repo is, but rather speculated that the git equivalent would be "100s of GBs".

Had the statement been: "Our SVN repo is alreay 100s of GBs in size" then yes you are not likely to want to stuff that onto a laptop but that was not the claim. The claim (without rationale) was that the git equivalent would be 100s of GBs which is similar but not at all the same.

one assumse 1:1ish correspondence the other applies an unstated multiplier SVN * n = git where n > 1 (and significantly so.)

In reality the multiplier appears to be negative in many cases:

> Git's repositories are much smaller than Subversions(sic) (for the Mozilla project, 30x smaller) [1]

[1] https://git.wiki.kernel.org/index.php/GitSvnComparison

tedunangst · on Sept 8, 2011

He speculated that the entire history (which I would say is equivalent to the SVN repo) is 100s of GBs. We are assuming a 1:1 svn repo to git repo ratio.

Mozilla's ratio would be relevant if they were storing something like the visual studio installer in their repo. They aren't, so it's not.

barrkel · on Sept 8, 2011

The binaries (including debug symbols etc.) are the bulk of that size, and storing all revisions locally will almost certainly add up to non-trivial size for a laptop. Unless git has some kind of magic differencing algorithm specifically executable code and debug symbols, I don't really see a way it could work - that's my rationale.

Of course, such algorithms do exist - Google's Courgette - but I don't think git is using them (I have looked) and doubt they are tuned to e.g. Borland TDS/RSM/etc. symbols.

I have no idea how large the svn repository is - it's stored on a SAN and run on a dedicated server I only interact with via svn. It could be many terabytes for all I know; and of course, my team's project tree isn't the only thing in the full repository.

seunosewa · on Sept 7, 2011

Can you maybe separate the binaries and use git for only sources?

tedunangst · on Sept 7, 2011

You're being downvoted for not using an HN approved workflow. Your workflow is inconceivable and therefore wrong.

xnxn · on Sept 7, 2011

Maybe I'm just grouchy today, but what's inconceivable to me is why you'd write an Oh No The Hivemind comment on this topic of all things. Some workflows actually are suboptimal.

tedunangst · on Sept 8, 2011

The people complaining about this particular workflow being suboptimal do not understand it, nor why it exists in the places it does. Sometimes when something looks stupid, it's because it is stupid. But sometimes it's because you don't understand what you're looking at.

danellis · on Sept 7, 2011

But have you actually tried it? You might be surprised.

barrkel · on Sept 8, 2011

I might give it a bit of a go when I'm on-site in CA in a couple of weeks, rather than trying it over transatlantic VPN - though my MBA only has 256GB SSD, and it's 50/50 Windows 7 and OS X.

Still optimistic? :)

burgerbrain · on Sept 7, 2011

Two words: git submodules.

nfarina · on Sept 7, 2011

Git actually starts compacting your object database and creating super-efficient delta-compressed packfiles after a bit. You can still throw object files in there afterwards though, and it doesn't change the basic principles of operation.

tedunangst · on Sept 7, 2011

Super-efficient delta-compressed packfiles of zip files are not, in fact, super-efficient.

nfarina · on Sept 7, 2011

Ah ha, yes, you are correct - but packfiles combine multiple blobs together to benefit from additional compression that you couldn't achieve by compressing each individually.

mkl · on Sept 8, 2011

Git repositories are quite a bit smaller than svn's[1], but hundreds of GB is pretty huge.

[1] http://www.contextualdevelopment.com/logbook/git/large-proje...

adobriyan · on Sept 7, 2011

git-clone --depth 1

prodigal_erik · on Sept 8, 2011

Shallow clones are essentially read-only, they can't push nor be fetched from. Unless you're prepared to regress to emailing patches around, a shallow git clone is actually less useful than a svn checkout.

mkl · on Sept 8, 2011

Shallow clones don't seem to save much space: http://blogs.gnome.org/simos/2009/04/18/git-clones-vs-shallo...