I think this kind of file system would do well to power platforms that AREN'T for programmers. While this may not seem useful for programmers (because they are so involved in git usage in the first place), I think others might benefit
I've had the idea to make an interative design/architecture plugin for programs like photoshop/3dmax/etc rolling around in my head for a while now. Git is a perfect way to store/save the progress people make in those programs (as files change over time, or as they save), and gitfs seems like it would be the fs to run on the backing server.
Instant branching (like when an artist decides to riff on a new idea), undo/rollback, progress tweens/reports.
It's my understanding that Git is pretty bad for storing binary files that change a lot since it has to keep a copy of every version of that file (whereas with plaintext it can just store the diffs). Now, maybe that's what you would want, but that repo is going to get very big very fast.
Pulls won't be so bad if your machine already has all the commits, but cloning would be a nightmare. And it'll be taking up a lot of space on your git server.
It's not exactly plaintext versus binary. Git "stores" a copy of every file, no matter the content. But it also does "delta compression" between objects when you run "git gc"; it tries to find binary diffs between objects that are similar. So the poor performance comes from files that don't "delta" well.
These tend to be things which are compressed or encrypted, where a small semantic change can cascade into a lot of bytes changing. Of course, binary formats often do both of those things. And the larger the files, the more painful it is when it happens.
Incidentally, the new office formats (both the "open" and the open one) are zip-files with (among other things) xml-documents. I saw someone a while back recommending storing them uncompressed when working with VCS -- gives pretty useful diffs with no extra work (I forget if they recommended extracting, or just storing without compression).
I remember trying candidely git and svn (around 2008 so..) on a webdev project involving many images including some xMB PSD files. Surprisingly the git repo grew sublinearly, while svn was absuperlinear.
With git packing, the full repo with a bit of history (mostly adding some php and a few images) ended up smaller than the original windows native folder. Version control and compression; nice :)
Very good point - however I think this is just an implementation detail of Git as it stands today.
Also, if the person is working with SVG (for example), then it's less of a problem.
Also, given the cheapness of disk, I don't think it would be a limiting cost. And since git is open source, if I were to actually make this thing, it would definitely incentivize me (or others) to make git less bad for storing binary files :)
I guess it depends how deep you dive... It is true that on the surface, git gives you the user access to full blobs only and calculates the difference every time you access blobs. But when you go in packfiles the content is actually diff'ed because it compresses so well.
In the context of the discussion, since we're interested in the on-disk format, it's more accurate to say that git will try to diff binary blobs, fail at that and so store the full content of blobs.
I don't know, judging by all the gnashing of the teeth I get when I answer questions with something like "you should actually learn how to use git instead of banging your head against the wall with this wrapper for it" developers could use it too.
> I've had the idea to make an interative design/architecture plugin for programs like photoshop/3dmax/etc
Hah, same idea here. I used to live with 3d artists, and I asked "How do you live without Git?!", they said they version their files into milestones (e.g. environment-milestone1.3ds) and upload them all to Dropbox.
I think Git wouldn't gain as much traction as you'd expect, there's just easier (for non-programmers) alternatives with a large userbase.
Yeah, was watching some designer friends of mine, and they have all this back and forth with clients, and keep immaculate notes with every stage of the transformation (from concept to finished design), I was wondering why in the world it wasn't automated.
Yeah, I'm not really planning on introducing them to git, more using git behind the scenes to get all those good qualities (undos, infinite history, instant backup) that someone might think was hard (that didn't know git)
There exist various solutions -- the technology is called Digital Asset Managements (DAM). There are some open source/Free ones as well. But none that are really good AFAIK.
You generally have the option of either using lots of data and bandwidth (think: storing rendered 4/8K jpeg2000 frames of animations, for each new version/cut/render you want to keep) -- or tight integration with the various software involved (ie: only keep latest "render" -- and a track of changes/a "script/log" that can be used to render/generate all the missing versions).
I'd think in general 3d meshes shouldn't be too hard -- but good luck finding someone doing media that don't use some big bitmaps somewhere (textures, backgrounds, etc).
A little more detail on the first statement would be great (outside of what people have posted already).
And yeah, I know it's been done before -- but that only proves that the market is there, and more importantly investors (notably YC) think it's worth it to pursue.
I doubt I could do as great a job as some of these products and their engineering teams, but then again, I don't know that I have to. I think it'd be a fun open source project.
Does it batch changes, i.e. if 5 files are saved within a few seconds of each other?
It would also be interesting if that was combined with naming commits based on language processing (i.e. splitting on camel case and snake case, finding the word whose frequency in the current diff is most different from its frequency in the codebase overall). Then you could have human-readable history without any conscious need to maintain it - and this would be developer-friendly, just Save All in Sublime Text and fuhgettaboudit.
Yes, it batches changes. Improving the commit messages is in the pipeline and what you have described sounds interesting. Right now the focus is to get the right semantics for git operations in the context of a filesystem.
This is reminiscent of Apple's Time Machine, except each save is the same as manually clicking "Back Up Now", and I assume that commits are more granular so directories can be committed individually rather than the whole tree.
I've often wondered if version control systems could be abstracted into a filesystem where each write is a commit, handling merges by choosing the local copy.
My primary dissatisfaction with git is that it lacks layers of abstraction. For example it should have had at least the first two distinct components listed below, something like:
1) git-fs (versioned filesystem, only supporting commit, clone and permissions)
2) git-merge (diff utility to handle merge conflicts)
3) git-local (two repositories wrapped in an abstraction to provide local and remote - the special sauce of git)
4) git-util (everything else like repair, reports, statistics, etc)
I’m not super familiar with git console use so if it already is organized this way, great. But since it is not presented this way in its documentation, I feel that a great opportunity has been missed. We could have used git-fs the same way that people use Dropbox. Instead we have something with a lot of warts (things like .gitignore files interspersed with other files that continue the same mistakes that cvs and svn made, and the inability to save empty directories). I think git’s pattern of pull/commit/push is fantastic, but its shortcomings are so numerous that I’m going to stop knocking it right here before I get myself in trouble.
If gitfs ran on the Mac, I’d probably be using it right now to avoid frequent headaches where git interferes with the simplest pattern of pulling, merging by hand and pushing all files back if nobody else has committed in the meantime. I think that’s the motivation behind a library like this, because so many version control systems get the filesystem metaphor wrong and create two much friction by touting their various levers.
Awesome, I have been looking for something like this for ages. I have already two use-cases:
1. At work we store a bunch of IPython notebooks in a git repository (with a hook to strip them from any output and other non-essential varying data). Up until now I had to manually or via a scheduled task run "git -A; git commit -m 'Current State.'" on regular intervals.
2. I'm so gonna use this for my portage-tree, which is taking up a lot of disk-space on my SSD :).
How do you get commits with more than one file changed? I assume a commit happened every time there was a write to a file on disk which can't occur simultaneously. Do commits just happen on some time interval?
Seems like an interesting idea, my concern would be that the abstraction of a filesystem on top of git would break down really quickly once multiples users started editing things.
Merging is pluggable and the currently implemented strategy is to merge always accepting the local changes. Actually we are currently using it in an environment in which multiple edits on the same file can be made and as in the case of "regular" filesystem the last one closing the file wins. The advantage is that you will have all the revisions.
Why would somebody use such a file system? To make life easier when working with configuration files?
I wish they also described their motive for creating it.
A developer or sys admin would use it to keep track of changes a site owner would do via ftp/sftp. We made it for this specific reason. Track everything! :)
So is this intended as a single user sort of tool? I'm curious how merges, conflicts & branching would interact w/ a tool like this, or whether it's just intended as a way to implement a versioned file system essentially where files can be rolled back to previous states.
Currently it follows a single branch. The merge strategy is to merge with local changes taking priority over remote ones but this can be plugged-in.
Rollbacks can be done by copying from history to current version.
The only thing shaky thing is when there is a force push because local commits will be pushed back.
makes sense as a simple deployment tool too i guess? simply mount the stable branch in your 'www' directory and you can push to deploy (for a static files based site at least).
No, I think the question is, what do you as an author of the software have in mind as a particular use case? Some examples of why someone would use this would be great.
We are using it to expose website code through SFTP to non-developers.
Another possible use case can be publishing static sites using jekyll or another tool.
We have given some answers (together with vtemian) related to some of the use cases that we see possbile and to the need that it led to its creation in the first place. But since it's open-source, we are also expecting fresh scenarios in which people would like to use it. Also, contributions are more than welcome! :)
As with performance it really depends on your workload. For mostly reads the performance should be quite good since access is mostly passtrough. This is almost true with writes.
Reading on "history" has a small performance penalty since if you are reading files which are packed they need to be unpacked on the fly.
Listing the "history" is quite fast. We tested on the WordPress repository that has around 17k commits and it takes ~4s first time and less than 1s afterwards.
It is intended for non-developers so that they can integrate their workflow with developers working on the same content. It can be used by multiple persons of course but it's not generally targeted for people developing apps or for replacing git.
Here's the use case I have in mind: let's say I'm a programmer extending a game [1] built on Undum [2] engine, together with a writer who is writing a story at the same time [3]
If you work on the same file the content will get versioned like it would when you use command line git. Also you will get some sort of accountability for changes.
It doesn't build on fedora. I fixed a few of the obvious problems, but it looks like there is more to it than that.. I assume that's why they say Ubuntu only on the Download page.
I've had the idea to make an interative design/architecture plugin for programs like photoshop/3dmax/etc rolling around in my head for a while now. Git is a perfect way to store/save the progress people make in those programs (as files change over time, or as they save), and gitfs seems like it would be the fs to run on the backing server.
Instant branching (like when an artist decides to riff on a new idea), undo/rollback, progress tweens/reports.