We haven't found a solution as we have given ourselves requirements to store rev...

sytse · on Feb 22, 2015

Versioning large files is hard. The way to do it is not to put them in the git repository itself but to manage symlinks to the SHA-1 of the large file in git. This way your git repo stays under 1GB but you can request the file from a server. GitLab EE uses the open source Git Annex for this but adds code to check your project permissions. All you need to do to get the files is a git clone and 'git annex sync --content'. Read more about this on https://about.gitlab.com/2015/02/17/gitlab-annex-solves-the-... Please let me know if you have any other questions.

We would love to see better support for GitLab in SourceTree and so would many others https://answers.atlassian.com/questions/47020/comments/26160... The response from Atlassian in that thread indicates they are less inclined to do it. Since GitLab is an open source alternative for their Stash+JIRA+Confluence+Bamboo products I understand their stance on this. Of course it never hurts to ask in that thread.

beagle3 · on Feb 23, 2015

... actually, versioning large files is easy, but not directly in git.

https://github.com/bup/bup strategically breaks files to chunks of ~8k length, in a way that makes small changes to a big file become a small change to a small number of chunks (ideally, changing one byte in a 80GB file would change exactly one such 8KB chunk -- and that is often the case, though it's common than 3 or 4 would change).

bup then puts[0] the chunks into git, together with a "reconstruction map". As a result, you can efficiently put huge files in git and only pay for the real deltas in storage.

[0] It doesn't actually use git or libgit - it writes git packs directly.

sytse · on Feb 23, 2015

Also see the discussion about Bup 3 days ago: https://news.ycombinator.com/item?id=9074114

I really like git annex since the repository is still completely git compatible.

1123581321 · on Feb 23, 2015

Thanks - I'll be contacting them.

I agree git-annex is the best solution for now - I just wish some effort would go into versioning binary files by creating projects that mimic the behavior of the programs that generate them in some way.

Probably the closest thing to what I am optimistically hoping to see someday is how developers save DB migration files instead of versioning copies of the database tables, and then run the migrations as part of the merge process to sync the databases.

Thinking about it more, it seems like a general project to create 'migration' specifications to generate binary files would be more appropriate than a project to modify version control.

Anyway, thanks for your work on Gitlab!

sytse · on Feb 23, 2015

Thanks for contacting them.

I agree that it would be nice to have more files being created algorithmically instead of being binary. You can diff an svg image but you can't diff a jpeg. Although the svg diff could be presented better than how we currently do it.

And you're welcome, more than 700 contributors made GitLab what it is today.

Jare · on Feb 23, 2015

Binary files (large or not) generally can't be merged, and thus working with them in a version control system effectively requires support for exclusive locking. Is this supported (I don't think it is) and/or in the radar?

sytse · on Feb 23, 2015

It is not supported but it makes sense. If it is important to our customers we'll probably make it.

j10t · on Feb 22, 2015

We have a 65GB repository with many binaries in Perforce, it's expensive but works well.

1123581321 · on Feb 23, 2015

Thanks. I am delighted to see Perforce has a free version for <20 users, which applies to us. I will evaluate it.

joshmn · on Feb 23, 2015

That hurts my head.