Hacker News new | past | comments | ask | show | jobs | submit login

Branches as sets of commits are useful for understanding history. I.e., which commits conceptually belong together and with a quick descriptor of what their purpose and/or who their primary author is (or whatever else your chosen policy for naming them is).

That branches are not the only refs that keep commits alive is irrelevant, because the other refs by themselves are not sufficient; not every commit is reachable from a tag and reflog entries expire. Branches remain a necessary ingredient.

Git needs this because it requires a named head for each commit because (1) it has a garbage collector and (2) it doesn't allow for multiple checkouts. In version control systems with multiple checkouts, a new branch will typically end up in a checkout of its own. Because there's no GC, it will not spontaneously disappear, and because each checkout keeps track of its own head, there's no need to name it. Git branches are an artifact of Git's odd architectural choices.

But my bigger point is not about wanting to keep something alive that isn't part of the history, but that branches may not always accurately reflect what's part of the history (because Git allows you to alter them, and that's often even part of the normal workflow, e.g. deleting a temporary branch once you don't need it anymore). This usually does reflect human error, but, well, a major point of using a VCS is to protect against the results of human error. If you messed up your branches, then you can lose data either because (1) it eventually gets garbage-collected [1] or (2) it doesn't get pushed anymore and some day your computer or hard drive breaks. This is what I meant by Git introducing fragility.

[1] Yes, once the grace period expires and it disappears from the reflog, but that only makes it less likely, not impossible, especially for stuff that sees only intermittent work.




Well, you can set the expiration period for the reflog. If I remember correctly you can set it to "never". You can also disable alteration and removal of branches on a remote git repo. It's really all about knowing your tools.

Then again, in my experience, if you manage to remove data or alter data which should not be removed or altered, and don't pick up on that before after the reflog expiration period, then you have bigger problems.


> Well, you can set the expiration period for the reflog

Much easier, you can turn off garbage collection (if you set the expiration time on the reflog, you have to make sure to set gc.reflogexpireunreachable, not just gc.reflogexpire). This is indeed what I do whenever I work with Git as the frontend (though not when I uses Git programmatically as basically a versioned DB backend). But that's not about me. I'm fully conversant in Git, up to and including having hacked Git repositories using dulwich. It's about users in general, all of whom have to deal with this.

Of course, this has its own problems, because lots of Git usage patterns have evolved around creating garbage and then throwing it away rather than having an explicit and safe delete operation that is rarely used (or at least to hide revisions rather than to delete them), and because it may not be possible on hosted repositories, such as on GitHub (Bitbucket allows you to prevent force pushes to master branches, GitHub doesn't [1]).

And even when you've made it safe, you still cannot have unnamed branches or multiple checkouts [2].

The more general problem is that Git's GC is not a user interface design decision. Git having a GC is ultimately a result of it being easier to hack together your own transactions without using an actual DB backend such as SQLite when you use (mostly) functional data structures with a GC. While this makes Git a fairly robust versioned database, these and other implementation decisions spilling over into user space is something that has plagued Git for a long time: for full proficiency with Git you need a fairly deep understanding of implementation details.

> Then again, in my experience, if you manage to remove data or alter data which should not be removed or altered, and don't pick up on that before after the reflog expiration period, then you have bigger problems.

This can easily happen on branches that see only intermittent changes or on personal repositories. Git was originally designed for being heavily distributed work, and it shows; it's relatively safe when commits are regularly mirrored to a network of contributors, not so much when they don't.

And, again, this is a problem that should not even exist. One of the primary tenets of source control management -- a major reason why we even have it -- is to protect developers against their mistakes. In particular, there are many cases where VCSs are being used by non-technical people (such as technical writers working on documentation) and they cannot be expected to have a deep technical understanding of the tools they are using. VCS tools must therefore be error-resistant.

[1] GitHub in many ways is still shockingly primitive, such as how it still doesn't support attaching files to issues. Referencing gists via URL is a sort-of workaround, but even that becomes painful with binary files. But that's another story.

[2] The reason for the latter is subtle, but basically boils down that the GC either needs to have global knowledge of all checkouts in order to trace checkout-local refs such as HEAD properly (which may be lost by a simple mv) or that multiple checkouts that use the same branch can get in a battle over who gets to point it to which commit (e.g., commit to master in two checkouts, and only one commit operation can "win", because master cannot point to two commits at once).


> ... lots of Git usage patterns have evolved around creating garbage and then throwing it away rather than having an explicit and safe delete operation that is rarely used

Any delete operation in Git is safe. At least compared to delete operations in other vcs's (hg strip comes to mind).

> may not be possible on hosted repositories

That isn't Git's fault though, but the hosted repository. You could say the same about Mercurial bookmarks.

> And even when you've made it safe, you still cannot have unnamed branches or multiple checkouts.

I don't understand why this would be nice to have. Mercurial had the former, which to me was completely useless. Should I ever want to go back to a previous unnamed branch, I would have to actively search for it, which on a large project would take forever. I always keep my branches named, after the issue number, for this particular reason.

I don't really understand what multiple checkouts are for (a quick google gave me nothing). I would assume it would be something like having multiple local repositories, but I assume that is not what you're getting at?

> a fairly deep understanding of implementation details

Which takes literally fifteen minutes to teach (at least the way I did it), and after that it becomes a lot easier to explain what the different commands actually do.

> it's relatively safe when commits are regularly mirrored to a network of contributors, not so much when they don't.

A network doesn't provide any more safety (except from a computer crash), as you don't push garbage.

> is to protect developers against their mistakes

To a point. To be completely safe we would have to auto-commit on fixed intervals, like a traditional backup system. Instead, the user is empowered with the decision to decide what gets stored. If the user wants to remove something entirely, or re-write history to make it more readable or whatever, he/she should have that power too. A default 30 days of full history is, IMHO, quite reasonable given what the user can do. If that doesn't meet your requirements, then there are other, better options out there.

> they cannot be expected to have a deep technical understanding of the tools they are using

They don't need one. Have someone teach them git add, commit, push, pull and merge. Then teach them to NEVER rebase. Done.

We could probably go back on forth on this a lot, as we're unlikely to change each other's position, I'll call it quits. Thanks for a fruitful discussion :)


> Any delete operation in Git is safe. At least compared to delete operations in other vcs's (hg strip comes to mind).

While I'm not a big fan of Mercurial's strip implementation (because the management of strip backups is cumbersome), in order to permanently lose data there you'd have to do hg strip and then also expressly delete the backup in a separate step. Git does the deletion on its own.

More importantly, you seem to be mistaking this for a Git vs. Mercurial argument. I am not making a Git vs. Mercurial argument; if anything, my position is that "Mercurial is less bad than Git, except in some areas where Git is less bad." And even that's not what I'm getting at, because I'm not interesting in comparing them against one or the other, but to point out that both are lacking.

In general, SCM development has stalled since the late aughts and content with a relatively subpar state of affairs. That too many people seem to think that Git and Mercurial are the only DVCSs ever invented and that centralized version control means either CVS or SVN doesn't help, either. My interest is in getting less primitive version control than the existing systems, not defending the antiquated status quo (antiquated to the point that there are even older, now forgotten systems that did some things better than the current crop).

> That isn't Git's fault though, but the hosted repository. You could say the same about Mercurial bookmarks.

Mercurial bookmarks, if misplaced, do not result in revisions on a server becoming inaccessible.

About unnamed branches:

> I don't understand why this would be nice to have. Mercurial had the former, which to me was completely useless.

They go hand-in-hand with having multiple checkouts. Create a temporary branch in a second checkout, and you don't need any label for it (because it's the tip of that checkout).

> I don't really understand what multiple checkouts are for (a quick google gave me nothing). I would assume it would be something like having multiple local repositories, but I assume that is not what you're getting at?

For example what (as I noted) git-new-workdir attempts to do (again, that's unsafe, use at your own risk). If you aren't familiar with this concept, you can't know very many VCSs, though. It's supported by all DVCSes other than Git in some form of another (though arguably a bit indirectly in Darcs, and Mercurial's version doesn't have good commandline support). It is also obviously supported by all centralized VCSs.

It simply means that you can have two or more checkouts (working trees, in Git parlance) of the same or different revisions, but only one shared repository. Multiple clones can only crudely approximate the behavior, since you can't directly diff or merge between branches in different clones, say. A simple example of the usefulness is to be able to compare the behavior of different versions side-by-side (particularly when you're dealing with slow builds). Another is to be able to create a separate checkout for an urgent bugfix without interrupting your current work, then easily integrating that bugfix into your work.

> A network doesn't provide any more safety (except from a computer crash), as you don't push garbage.

My point is that the likelihood is higher that accidentally deleted material has already been shared with other developers, so it's more likely to be recoverable if you misplace a branch.

> To a point. To be completely safe we would have to auto-commit on fixed intervals, like a traditional backup system. Instead, the user is empowered with the decision to decide what gets stored.

Agreed. But my point is that Git unnecessarily trades away security, based on an implementation decision (to achieve transactional properties by using functional data structures plus garbage collection). Nothing is ever going to be perfectly safe, but that was entirely avoidable.

> We could probably go back on forth on this a lot, as we're unlikely to change each other's position, I'll call it quits.

I don't ever expect to change anybody's opinion; I am merely explaining.


The reason I use Mercurial as an example is because that's the other DCVS I'm most familiar with.

> A simple example of the usefulness is to be able to compare the behavior of different versions side-by-side (particularly when you're dealing with slow builds).

That usecase I can understand. The other two you mention though could easily be solved by branches.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: