My number one favourite thing about Git is that is that I have a full copy of the repository history and I can work on it without talking to a central server, or any server. I can do a `git blame` or a `git bisect` without receiving a byte from the network, which is great because my internet connection is horrible. I can work on a train. I can work on an experimental feature in a branch with full history, decide it was a bad idea, delete the branch, and no one has to know about my failure. I can even share that branch between multiple machines in my house without setting up any server infrastructure. I can write shitty commit messages like 'fix the thing for real this time' and clean them up later with a rebase before I push. I can make a series of commits, convert them to a series of patches and apply them during a build script with `git am`. I can version any folder on my machine just by typing `git init`, which is especially useful for folders of config files.
Also, I think Git's model of working locally encourages people to think of their commits like their code. Good Git projects have small commits, a clean history and commit messages that are like documentation (which is especially useful, because you can use blame to locally find the change relating to a line.)
The author conveniently ignores one important downside to centralized source control: speed.
Switching branches in git is incredibly fast, specifically because you have all the history locally. The reason branching was so uncommon in SVN was because of how horribly slow it was to switch branches.
> Of all the time I have ever used DVCSes, over the last twenty years if we count Smalltalk changesets and twelve or so if you don’t, I have wanted to have the full history while offline a grand total of maybe about six times.
The problem with that statement is that it's wrong. Every time he switched branches or made a new branch it was relying on his local copy of the repo.
Sure, he makes some good points about binary blobs and submodules, but he also avoids talking about the downsides of centralized version control.
If you work as a field tech, or on a space station, or in a submarine or something, then okay, sure, this is a huge feature for you.
I'm not an astronaut or a submariner, but I do fly in the cheap seats on commercial aircraft - sometimes to places where roaming data is eye-wateringly expensive.
Just this afternoon, I went over to the cafe across the road from the office for a couple of hours to get some code review done. This cafe doesn't have WiFi - which is part of the attraction for a less-distracting environment, really. Having history available locally is fantastic!
And even if you're at a desk with a permanent internet connection, the server roundtrip is never as fast as local disk access. Having fast access to history is something I don't want to miss. On top of that, having it all locally allows for more uses like searching through history and more advanced features.
I think they mean without setting up a VCS server. You still need for the 2 machines to be able to access each other somehow, usually ssh (which is technically a server), or a file server (NFS, samba, which is also technically a server), but, many people have one or both of those running already running. If so it's just
Yeah, like greggman said, I was mainly referring to SSH, though I've also transferred repositories with a flash drive before. I once worked on a project with a coworker where we used git daemon to pull from each other, which is technically a server, but easy to set up and we didn't have to bug the IT department about getting a real server to use. It worked surprisingly well.
As a developer on the Linux kernel - the project for which Git was initially designed - I recommend people take a good look at the kernel workflow to appreciate where Git came from, and it's a good example of a project where the D in DVCS is so very, very critical.
We have a massive codebase (about 20 million LOC), massive repositories (my local copy is 2.8GB), we're massively distributed (my local repository currently has 17 remotes and 40 local branches)... and we still use mailing list patches and pull requests, we don't use submodules, and we don't rely on GitHub as anything more than a plain git server. And things mostly work...
It's an interesting contrast to the way the average project on GitHub works (of course, the "average GitHub project" is now the primary use case for git...)
* Patches are emailed to subsystem mailing lists (using git send-email, usually), reviewed by the community, and potentially resubmitted a few times to respond to community feedback. (There is also the infamous Linux Kernel Mailing List which is generally used for patches that have wider effects, new drivers, stuff that doesn't fall neatly into a subsystem...) Often tracked using tools like Patchwork (e.g. http://patchwork.ozlabs.org/)
* As a result of this, each individual developer generally maintains a tonne of branches for themselves - they work independently of anyone else.
* Kernel has quite strict requirements for how patches/commits are formatted - as per SubmittingPatches
* Maintainers apply patches to their own tree (potentially with their own modifications if needed). They often use a structure like a "next" branch for patches intended for the next release cycle, and a "fixes" branch for bugfixes which can be included in the next RC release.
* Maintainers generally email a pull request (git request-pull) to Linus when they want changes merged. Linus reviews and ACKs/NAKs them.
* Linus will only accept new features during the 2 week "merge window" that immediately follows each new kernel version. Following the merge window, rc1 is released, and from then a new rc is released every week until rc7 or rc8 (at Linus' discretion) - only bugfixes or very minor things accepted during this time.
* In the meantime, each maintainer has their own git tree, and someone has to try to integrate them together for testing purposes. That someone is Stephen Rothwell, who merges everyone's -next branches 5 days a week and releases linux-next (http://git.kernel.org/cgit/linux/kernel/git/next/linux-next....) as a snapshot of what the next kernel release will look like.
So there's quite little centralised infrastructure involved in this process apart from a few things on kernel.org. There's ~1500 developers involved in each kernel release across dozens of companies. As you can probably guess, there's a lot of cherry-picking, rebasing, and so on involved in all this.
Just for the learning part, I moved from CVS to Perforce 10 years ago in a new job and it took me perhaps all of 4 hours to completely understand the work flow and the GUI. Until now when I started to see what Git is all about and after a couple of days and lots of different clients and articles, I am still not sure about my Git skills. It's certainly isn't as straightforward to learn and use.
I guess this is because there are so many more exposed levels of git to learn. The introductory chapters in git books about all the "this is how a commit hash is computed" and "it's all about the content, not the filename" is good if you want to know whats going on on the lower levels, but for git newbies it's more confusing than helpful.
The git concepts you need to know that are non-obvious are:
- there is no special branch, "master" is just a default name
- there is no special "central" repository on a technical level
- all git commits have one or more parents, but they don't know what branch they're from (i.e. reverting a merge can be a pain)
- a git commit always represents an entire project tree, you can't version individual files like subversion does
Unless you're planning on spelunking into the plumbing commands (i.e. low level stuff exposed to shell commands), small parts of the porcellain commands (i.e. that the end user should use) is more then enough to work with git and understand the "how I need to work with git" flow:
The commands I use 95% of the time are:
- git init
- git add
- git commit (and git commit --amend when I didn't pay attention)
- git rm
- git checkout
- git branch
- git pull
- git push
and most without any commandline options. These command invocations you should be able to learn within 4 hours, especially if you've used an SCM before. (and the git <command> --help pages are actually well written, once you know what you want to do).
Sure it's nice to know about git add --patch, or git rebase --interactive, but do you really need them to work well with git? I don't think so. If you're inclined to learn more about your tools, sure, go ahead. But thats something that comes with years of use and doesn't have to happen up front.
I find that knowing how it works is the only way the commands and flags make any sense. I am constantly seeing people around me fumble through git trying to get by on a handful of commands they've memorized and just live with the fact that they need to re-clone their repo every now and then.
These people scare me. How can they be comfortable not even knowing what they are telling git to do?
I taught myself how the DAG worked, then the low level commands to manipulate it. Now when I read the docs I get nice surprises like pre-built commands for doing the series of operations I plotted out in my head (most recently 'git merge --no-commit' instead of read-tree, update-index, write-tree)...
Oh god, I've become that guy in college who refuses to use a CRC library until he understands the proof.
But seriously, I'm genuinely amazed people can use git at all without understanding it completely. The mnemonics make zero sense without background, and the operations are completely arbitrary looking.
I'm genuinely amazed people can use git at all without understanding it completely. The mnemonics make zero sense without background, and the operations are completely arbitrary looking.
Remember, most things people use every day without understanding them completely. This is a huge barrier to entry and effective use of git; you can't require people to spend weeks learning it before they commit a single line of code. So yes, people learn sets of runes that work, and know that if you step off the path there's no easy way of working out what happened let alone undoing it without blowing away the local copy and going back to the master.
> [...] you can't require people to spend weeks learning it before they commit a single line of code.
But you can expect them to spend two hours to understand the little number of
fundamental objects that git works with, and maybe another quarter for basic
operations (fetch, push, merge).
The thing is, a lot of git commands are really, really subtle. And almost all of them don't do what you usually want without a couple of switches.
Personally, I'm more of a fan of darcs. It has a really good UI - defaulting to interactive use where it shows you what you're actually doing - and instead of branches you just clone into another folder, reducing a lot of cognitive load around them. It integrates with email as a primary use case - if you have a mailing list and an email client, you have a pretty good replacement for github pull requests.
I only delete everything and start again if I have made the mistake of making any changes to some source code. After a while you learn not to change things.
Lots of projects using git means that I get to spend more time doing things other than coding.
My top git commands from the commandline history are git status, git commit -a, git diff, git gui, gitk --all. There are a few git pull/pushes in there, but these 5 dominate. I hardly ever use git add, and over the years I've pretty much stopped using the index altogether. The GUI tools are my way of doing the fancy stuff. They work great and expose a lot of the more complex features without having to remember all the command line options.
That is the chicken-and-egg problem with git: to understand it well, you need to understand the lower level algorithms, because this is how git was developed in the first place. But most people will complain that it is hard to learn this low level interface, so they will always get stuck when understanding the high level operations.
It is in a sense the same with C programming. C is extremely easy to understand if you have a good grasp of machine architecture and assembly language. Everything makes sense. But if you look at it from the point of view of a high level language, you will always be amazed at why it does things in that particular way.
I highly recommend using SourceTree. It's quite slow, but at least it is understandable.
Command-line git just sucks. Partly because the interface is just badly written and inconsistent (Mercurial is much better for example), and partly because version control just fundamentally benefits from a graphical interface, in the same way that a web browser would for example.
Only a madman would use wget to browse the web (yes I know about him). You'd have to be nearly as mad to want to use the git command line.
There is incredible power in the git command line and to simply write it off is silly. If you take the time to learn a bit about how it works under the hood it makes a lot more sense. When you truly understand that it is just a directed graph, along with blobs, and trees. All of which are represented by hashes. Dig in because your seriously missing out just using a GUI.
The thing about git is that it is a very open ended tool that doesn't enforce any one work-flow. Find one that suits you and your workplace, and then slowly accumulate tricks that make things simpler to reason about.
When I was new to git I had to choose a workflow while not having sufficient knowledge or experience with git to do so. It was frustrating. I wish there had been some sort of conventional workflow I could have adopted for my situation.
It took a year of using git daily until I was felt able to choose a workflow that was appropriate for my little company.
Blobs are different. You can tell that because even SVN has a flag that marks a file as a blob or not, and treats it differently in history and when merging if it is. We all know that when you have a function that takes a flag to do one of two different things, it's probably better to make it two different functions.
Missing functionality can be a good thing when that functionality lets you do something stupid. Yes, SVN + discipline about the size of your repos can be as good as git (on that particular aspect). But discipline isn't free.
DVCS true believers don't believe in submodules / subrepositories / etc. Those things are sops to the big-repository diehards. I've worked on small repositories and I've worked on one of those organization-wide repositories the author talks about; the big repositories aren't like that for good reasons, they're like that because of dysfunctional organizational politics. Sure, Facebook uses one, but Facebook also uses PHP.
You always needed to base your patch on the correct thing when contributing. You don't need to make a new local branch - it's useful to do so, but not required. Sending a patch email takes longer than pushing a commit to github and making a PR, it just doesn't sound like it because the author hasn't written "3. Open an email client 4. Click "attach" 5. Browse to the directory where you stored your patch ...". And you do gain a lot in terms of ability to rebase, which is vital when patches go ignored for as long as they do. A pull request from two years ago is often salvageable; a patchbomb simply isn't.
Sorry, my poor English got in the way of comprehension; I thought 'sop' meant something else.
As for why would I use submodules, we have a modular system from which different client packages are produced, and we need to manage the lifecycle of each module independently for each client.
> we have a modular system from which different client packages are produced, and we need to manage the lifecycle of each module independently for each client.
That makes sense, but sounds like you're not particularly making use of the submodule-to-emulate-big-repository functionality? (that is, each module has its own lifecycle, so there's no need to coordinate between updates to multiple modules for a different client). Indeed it seems like you couldn't accomplish this with a single big repository at all.
(Personally I find submodules cumbersome as a way of managing a project in that setup (I'd rather just have releases of the modules, and then the per-client modules depend on releases), but if it's working for you then fair enough)
Is there a reason not to make use of your language's package management tools instead? In my experience, most such "submodules" are better off handled as packages (npm, NuGet, jspm, gem, et al) than submodules. Certainly there are complications for private/enterprise feeds, but most of the package managers have options these days (including pointing at just about any git repository directly).
There are two constraints: we use Python and we run multiple copies of the software (with potentially version-incompatible modules) on the same host.
That means we'd have to use virtualenv or similar, plus pip or easy_install to actually fetch and install the modules. On the developer side, it means having to manually update the dependencies file to upgrade a module, while with submodules they can just do "git pull" inside it and then "git add" in the parent repo, which auto-detects the new version.
I'd love to see a simpler approach, but I couldn't find it.
Ah, Python. Yeah, virtualenv + pip or docker + pip would probably be a better way to manage that sort of thing and increase separation of your git repositories (and better yet, better separation of your Python VMs), but I can see why submodules feel easier in this case if you are dealing with system global installs of Python packages right now.
When I send a patch email, it's just a matter of running "git send-email -1 --to=linuxppc". :) (I'm not saying that we can't do better than patch emails, but after configuring my git client with an SMTP server, it's as easy as opening a GitHub PR.)
>Facebook and Google know that keeping all your source code in one single repository is good
I'm not sure about Facebook, but Google continually wastes engineers on policing test failures because anyone at any time could have committed a change that breaks your tests for hours (along with all their other dependencies). The cadence of integration should match that of deployment; I don't want to test against code that won't be in production and I certainly don't want to be slowed down over shitty code that obviously can't go out.
There's one further point the article missed, which is: locking of binary assets.
For any file which cannot be merged, you need to have a "lock/checkout, edit, commit" workflow.
There's no point allowing two people to have a PNG file in their checkout, edit it, and commit it, only for one to get the message "sorry, someone else edited it, your edits are now based on an out-of-date version, deal with it". You need to enforce the "lock, edit, commit" workflow on all files which cannot be merged.
Essentially, binary files cannot be merged, and therefore all binary files require this workflow. I'm sure there are some exceptions, I think Microsoft offers a merge tool for Word files. But I've never seen it hooked up to a VCS. But I suppose in principle it, or a similar merge algorithm for a particular file format, could be. But even then, there'll still be all the other binary files, for which no such merge algorithm is configured.
All project have some binary files e.g. PNGs. So all projects are affected by this, and require this workflow on those files.
Yet a DVCS cannot offer it as there is no central place to maintain the lock.
Subversion offers this workflow however it isn't enabled by default.
A VCS needs to offer this functionality as every project needs it, and it should be enabled by default for all files the VCS doesn't know how to merge.
It's not really so different for mergeable files - just because two independent changes to a file can be automatically merged doesn't mean that the merge result is sensible. You still need a human in the loop to make sure two simultaneous changes didn't conflict semantically, even if they were able to be munged to be compatible syntactically.
Does this mean we need locking for everything? No, of course not: people get by, by using out-of-band methods for co-ordinating potentially conflicting work. The same tends to work for binary assets, too.
It's true that a "successful" merge of text source can still result in semantic conflicts, I definitely agree and have experienced it. But the fact text merging works most of the time is justification to use it. But merging binary assets works 0% of the time, so it's a much more severe problem.
Further to it happening more often, the consequences when it does happen are worse. You can't even do a diff of binary files. If text source changes lead to problems you can look at the diff and gain an understanding of what was changed. (e.g. A renamed a method, but B checked in new code using the old method name). If you spend time altering a Visio file, only to receive the error message that it can't be merged, you literally have no other option than to open the original, theirs, yours, and try and look at it and hope you see all changes.
We should aspire to more than just "get by". The above problems can be solved by VCSs, just not DVCSs.
Git in its current form is usually not a pure DVCS though; everyone has a "canonical" repo somewhere, be it a buildbot or github/gitlab. If Git supported synchronised "lock flags", people would just need to make sure they pull from their canonical repo before trying to edit binary files. You should be able to mark specific files or directories as "lockable".
As it is, you could implement this sort of workflow manually or via an extension, pushing a special file before one starts working on binary assets ("git binedit" ?). It would pollute the commit log a bit, which is why it would be better if it were a native feature, but imho it could be done.
Rather than a special file you could use special branches and a new primitive git object for those branches. Git garbage collection would eventually deal with the pollution of that.
Using commits (with no file contents) as your locks (rather than creating a new git primitive object), you'd have an easily automatable workflow along the lines of:
# Clear current locks
git push origin --delete binedit/WorldMaker
git branch -D binedit/WorldMaker
git gc # Optionally force a garbage collection
# Create new lock
git checkout master -b binedit/WorldMaker
git commit --allow-empty -m "binedit: file/to/edit.psd"
git push --set-upstream origin binedit/WorldMaker
git checkout master # Avoid adding actual commits to the binedit branch
Searching for binedit locks would be git fetch all remote binedit/* branches and review their commit messages. Not the fastest system to query in the world, but you could cache it well, presumably. You can also fetch such branches from multiple remotes, leaving it somewhat distributed a tool, too.
I'm just not convinced that they should be solved by VCSs - I tend to think that the general issue of conflicting changes calls for a social/process solution, not a technical one. Does the set of "lock/unlock/revoke" operations really add much over sending an email to a list or designated object owner saying "I need to update the Visio diagram"?
Plenty of people get by without it. It's not all that popular because it requires discipline to get people to lock the minimum they need for the minimum time, and of course occasional administrator intervention when someone locks a file and goes on holiday / leaves the company.
"You needn’t look further than how many books and websites exist on Git to realize that you are looking at something deeply mucked up. Back in ye days O Subversionne, we just had the Red Book, and it was good, and the people were happy.11 Sure, there were other tutorials that covered some esoteric things that you never needed to do,12 but those were few and far between. The story with CVS was largely the same."
Things like branching and merging. Which you never needed to do because you couldn't, unless you have a giant, throbbing Thalosian head. (It's not like I was the branch-n-merge guy at the last two jobs...)
Let's see if I can follow the logic: Git is popular. Therefore, lots of books. Therefore difficult to use. Therefore bad.
Then there's the part about centralized systems being ok because bandwidth is cheap and easily available, but distributed systems are bad because they use too much space.
One leg is now noticeably longer; please pull on the other for a while, dude.
Speaking of schlepping patches around, I don't suppose the author has ever been told, "Your stuff is good; can you update it for the current head?" and been unable to find a version where the patch applies cleanly? Fun, fun, funfunfun.
The author complains about DVCS in general but in fact it highlights what are mostly issues with the git+github combination.
For example, if git blame is slow as hell, I'm not sure that it would be much different if it was running on a central server
The fact that Git has an atrocious UI isn't related to it being a DVCS either. For example I find that Mercurial has a nice UI that makes it much easier to use and less error prone than both Git and Subversion.
The issues with Github again are not related to being a DVCS. In fact in that case Git is being used more like a centralized repo (push to one central repo, make a pull request from one repo on the server to another). What I miss in fact is an easy way to make a push or pull request directly to someone else. It would be nice to be able to do that from the command line through an IM handle for example.
I think I'm one of the few developers who use the "Distributed" bit of DVCS. I do in fact often find myself on a slow boat down the Amazon with laptop in tow for weeks on end, so being able to make disconnected, versioned changes and have them sync back to the mothership when I do get a whiff of wifi is a big step forward.
It's the branching and merging bit that I could do without.
Say what you will about Subversion, but using it is like having a big red "Save" button on you codebase. Click commit and off your changes go to the repository and you can once again safely have your laptop stolen without losing work. I have a few projects still on SVN and it's really a breath of fresh air going back to them after a few months of feature branches, pull requests, and merges.
> With a DVCS, though, we have a problem. You do have the whole history at any given point, which in turn means you need to have every version of every blob. But because blobs are usually compressed already, they usually don’t compress or diff well at all, and because they tend to be large, this means that your repository can bloat to really huge sizes really fast.
You can shallow clone a repository in git. This looks more like 'What should be the default mode for a source control : Centralized or Decentralized ?'. I think it should be decentralized since it requires more discipline.
> Git is so amazingly simple to use that APress, a single publisher, needs to have three different books on how to use it. It’s so simple that Atlassian and GitHub both felt a need to write their own online tutorials to try to clarify the main Git tutorial on the actual Git website. It’s so transparent that developers routinely tell me that the easiest way to learn Git is to start with its file formats and work up to the commands.
This could simply mean that it is am amazing piece of technology that everyone loves to talk about it. It could also mean that it is a bit more complex, but that doesn't mean that the complexity is not worth the benefits.
> There is one last major argument that I hear all the time about why DVCSes are superior, which is that they somehow enable more democratic code development
To be perfectly honest, most companies would do fine with subversion and a very good web based user interface. The simplicity and the advantage of not having to teach people set in their ways may work in its favor.
ad "git is slow": when comparing SVN diff vs git diff on a SSD MacBookPro git diff is in most cases waaaaay faster than svn diff.
Also, the author ignores the biggest advantage of git/hg over any centralized system: you can commit stuff without needing to be online! Commits are basically free until you eventually do git push.
The author actually tries to argue that network access is ubiquitous. But no matter how much money you blow on tethering, you're going to have a bad time in a train tunnel. Or an airplane. Or a road trip.
I don't think it's an overwhelming majority if everyone on your team is likely to be cut off at least once a year. At that point it doesn't take much lost productivity to justify the storage for local copies unless you literally have so many huge blobs they can't all fit on a laptop SSD.
Well sure; but DVCS's don't solve for that problem (or at least, they don't explicitly enable you to work offline). All VCS' allow you to continue to work offline in some form.
DVCS allow you to maintain a workflow; which is a subtly different thing (and if your use case is short-term once per year for each person then is it a critical requirement?).
I'm not arguing against DVCS; far from it. Just trying to point out a bit of a "requirements fallacy" in some of the comments.
You'll have a set of needs (requirements) in your business for an SCM tool. Based on the priority of those requirements then you can make a good choice about which tool to use.
But the existence of such a requirements "a stranded developer couldn't code on the train" doesn't create a global condition where using DVCS is the only choice :)
Lack of network access isn't the only advantage of the distributed model. Computing resources are another major factor. A site like GitHub would not work at all if all those branches, diffs, merges, blames, etc. had to be handled server-side. The author of the original post mentions OpenBSD as an example of how great CVS is but how many brand new users manage to get a patch into the project? Git may be harder to use than CVS but the ability to play around with a local copy of the repo gives new users far more room for experimentation.
Or some IT-Sec guy decided to ban outgoing ssh (so, no svn+ssh for you), or a forced HTTP(s) proxy which unfortunately breaks with the WebDAV commands from svn...
Oh and never mind the case you have a multi-GB SVN repo with >100k files and need to relocate the server URL. I'm still having nightmares from that.
While I love being able to fix my commits up locally before pushing, it's insane that I have to understand how git is actually implemented to use it effectively. Literally no other tool I have ever used has this characteristic. Imagine having to understand the Microsoft word file format in order to edit your documents!
I've seen multiple times people throwing away a bunch of work because their Word file got screwed up and they couldn't fix it. In git you can do the same - just learn the basic commands, and if it gets screwed up delete the local repo and clone again.
Oh, I know the feeling :) You explore new techs, everybody tells you you're insane trying to do things that complicated, then a few years later they all do it and tell you you're a bad programmer if you want to explore any other idea and don't think of the current way as absolute. The ruby community is hit hard by this problem currently.
That being said, there's something I love in DVCS in the way it allows me to work. I'll usually make my own things locally, do tons of branches and commits, to have "documented" checkpoints I can quickly throw in and revert to, then squash them into a single entity for publishing that keeps the project history easy to read. I'll probably won't do that if it meant publishing this info to everybody.
But I won't say you're a bad developer if you don't do that :D
This is why I'm keen on "fossil scm", especially for small projects. It uses similar syntax to cvs and subversion, but you have a full local copy of the entire repository. The local repository acts as a buffer between the centralized repository and the living on disk code. Also helps that fossil has very few dependencies and is a single self contained executable for everything. And it uses sqlite as its data store.
The thing with this DVCS debate is that for years people have been going (and going, and going) "Oh, you simply _must_ use this marvelous new shiny swiss-army knife, it is _so_ much better than your beat-up boring old-ass screwdriver!"
"No it isn't, not if my job is tightening screws ..."
You give that talk to a tradesman, you get your block knocked off for your pains, but programmers apparently are more easily intimidated into being fashion victims.
The article claims that DVCS aren't simpler to understand, but I think that's very subjective. I really couldn't wrap my head around cvs, svn, or hg until I learned git, which, ironically, I avoided until last because it was the "hardest".
I think CVS and SVN were easier to understand than git. But then again, the understanding of those miserable stuff often lead to anger, then hate and suffering.
I know, but it turned out that it was easiest for me to understand DVCS through visualizing operations on git's data structures. Obviously this isn't the case for everyone, but my point is that it's subjective to say that one VCS is simple or easier than another.
Also, I think Git's model of working locally encourages people to think of their commits like their code. Good Git projects have small commits, a clean history and commit messages that are like documentation (which is especially useful, because you can use blame to locally find the change relating to a line.)