Subversion Vision and Roadmap (spoiler: no DVCS)

dasil003 · on April 2, 2010

Subversion has no future as a DVCS tool. Let's just get that out there.

It's fortunate that the developers are smart enough not to go down that road.

The big problem with svn is that the simplicity of the underlying model does not translate into robust features on the front-end. The idea of directories as subdirectories and branches and tags is the result of muddled thinking that will make it impossible for svn to ever have reliable merge tracking. Partial checkouts are a useful feature, but they are not worth botching the architecture so as to make it unusable for basic version control needs like history tracking, diffing and merging.

The idea of an atomic repository with an identifying hash is so powerful, as exemplified by git and other DVCSes, even for mundane development tasks and workflows, that svn is basically a dead-end. They can't fix the situation without breaking backward compatibility, and they can't break backward compatibility without a huge uproar from existing repo administrators.

arebop · on April 2, 2010

Could you please elaborate on how the use of hashcodes rather than counting numbers for identification on an atomic repository botches history tracking, diffing, or merging?

dasil003 · on April 2, 2010

Well the hashcode effectively signs the full repository state, so that you know that if a repo has the same hash code it has the same contents, however that's not the main problem with svn.

The main problem is that svn has no notion of a project to attach it's merges or comparisons against. A directory in a repo may be a subdir of the project or it may be a branch. This leads to an explosion of edge cases and legal commands that make no logical sense. As long as everyone follows certain practices (such as not committing changes to two branches at the same time) then a heuristic approach sort of works, but that's hardly the way to design a robust (and ostensibly simple!) system.

If you can merge subdirectories in a project, and if you can merge branches which are just an ancestor directory of that, and if a given merge only affects certain subdirectories in a repository, how can anyone expect subversion merge tracking to be viable? Even if they somehow munge it to work in 99.9% of real world cases, think of the complexity and mental overhead of maintaining this solution compared to what a DVCS with clear-thinking primitives can achieve. It's time for the successor to svn in centralized version control systems to be built from scratch. svn itself is hopelessly hamstrung.

durin42 · on April 3, 2010

I've discussed this at some length with a few of the original Subversion implementors. The short version is that they didn't go back and do a slight rethink when they came up with changeset objects, and that causes part of the problem. The rest of the problem is that branches aren't a first class thing in the system, leaving no good way to mark mergeinfo. Hindsight being what it is, you could probably build a Subversion-like system with almost the same filesystem by making the notion of branches and tags first class, and their just-a-copy nature being somewhat hidden from the user.

Sadly (on some level, I'm a huge DVCS advocate, but see the need for CVCS in corporate environments), that's largely water under the bridge - many of the original team have moved on to using DVCS tools. Subversion 2.0 as a completely clean break with the past to fix some flawed design decisions feels very unlikely due to the political forces at work.

arebop · on April 3, 2010

When there's only one repo, there's no ambiguity about the content represented by a revision number. Revnums are perfectly adequate for the many situations in which frequent coordination with a central repository is completely feasible.

Git also does not require every file and subdirectory to be modified by every merge, so based on that criterion it's no more astonishing that SVN merge tracking can work than it is that Git merge tracking can work.

SVN's merge tracking was an after-thought, and it was implemented using the general, user-visible, metadata facility (properties). So historically svn was prone to problems such as repeated merging, and today it's possible to make things complicated, and then manually edit and botch the complex merge history, and suffer.

Fundamentally, svn:mergeinfo summarizes history rather than pointing only to merge-parents. I suppose the latter is what you mean when you speak of clear-thinking primitives. I wasn't involved in the mergeinfo design, but it seems less likely the result of muddled thinking than of a design decision recognizing that the entire revision graph isn't locally available to svn users and that the summary is sufficient under reasonable restrictions on usage.

Your condemnation of svn as a "hopelessly hamstrung" "dead-end" seems derived more from your dismissal of centralized version control generally than from the particular details of svn's design or implementation. In that dismissal I think you're taking too narrow a view of the ways in which people work together.

dasil003 · on April 3, 2010

Wow, what I said just sailed right over your head.

Git also does not require every file and subdirectory to be modified by every merge...

Uh, nooooooo... git always merges the whole repository. This makes merge tracking easy to implement in a complete way, and easy to reason about as a user.

Fundamentally, svn:mergeinfo summarizes history rather than pointing only to merge-parents. I suppose the latter is what you mean when you speak of clear-thinking primitives.

No, I'm referring to the fact that in git has a strict definition of both branches and the project tree, the notion of a merge isn't a primitive, it just sort of falls out of the primitive definitions naturally. Each revision in a git repo contains one and exactly one copy of the whole working tree. In Subversion the repository just has directories, some of which are project tree directories, and some of which are branches, and some of which are tags. That lack of clarity leads to all sorts of problems with basic functionality that VCSes should have.

it seems less likely the result of muddled thinking than of a design decision recognizing that the entire revision graph isn't locally available to svn users and that the summary is sufficient under reasonable restrictions on usage.

What it seems like is that the designers had only ever used CVS, so what they were working on seemed so advanced to them at the time, that they thought that "cheap copies" and partial checkouts were just icing on the cake and they had no idea of what they were trading away.

However in hindsight it's clearly not worth munging everything together. You are far better forcing users to actually define branches and subprojects. The amount of work that svn saves you is insignificant next to the degradation of the information stored in the repository data structure.

Your condemnation of svn as a "hopelessly hamstrung" "dead-end" seems derived more from your dismissal of centralized version control generally than from the particular details of svn's design or implementation.

It might seem that way because you don't appear to have understood a word I said. Nothing I've said had anything to do with distributed development. Everything I've said is specifically about svn's primitive concepts, and dead-end is a perfect way to describe it.

This ignorant defense of subversion needs to be stopped. I understand there are reasons people need to use subversion. I understand there are use cases that DVCSes don't fit. I certainly don't think git is the end-all-be-all of VCS. But to defend subversion without adequate understandings of its failings just makes you look bad. It's no different than a Java developer talking about how they don't see what's so great about Lisp macros. If you don't grok the concepts then any arguments you make are just noise.

dustingetz · on April 2, 2010

more detailed spoiler:

Subversion exists to be universally recognized and adopted as an open-source, centralized version control system characterized by its reliability as a safe haven for valuable data; the simplicity of its model and usage; and its ability to support the needs of a wide variety of users and projects, from individuals to large-scale enterprise operations.

A shorter, business-card-sized motto might be: "Enterprise-class centralized version control for the masses".

umjames · on April 2, 2010

All I can say is thank goodness I can use Git with Subversion without anyone being the wiser. Local feature branches are one of the godsends of DVCS.

vlisivka · on April 2, 2010

Yeah. But it is hard to send local branch to review to other team members when Subversion is used on the project.

We use Savana to add proper branches to Subversion. Tool lacks polishing, but it is helpful. It is simple to use: you can create private branch, sync your private branch with parent branch, and promote your private branch to parent branch ( http://savana.codehaus.org/gettingstarted.html ). It is not a git replacement, of course.

njharman · on April 2, 2010

Yeah, and I'm fine with that.

I'm so against the notion that every tool has to have all the feature/act like every other tool. DVCS vs CVCS, vim vs IDE, etc. A variety of tools focused on different strengths is a great thing. A bunch of me-also's is not.

plq · on April 2, 2010

in my opinion, subversion fails to properly support many features it claims to support. when people find that some things do not work as advertised by design, people naturally get pissed.

for example, subversion claims to do path-based authorization. we found that it only works when not using repository-wide branches/tags. when one wants to assign different rights to different users on different parts of a project, and also wants to tag the whole project, the policy file needs to be adjusted accordingly for the new branch.

so the best way to do it is to hold these parts on separate repositories, which breaks subversion's main selling point over cvs. so one defines svn:externals, which in turn needs to be updated for every branch/tag with revision numbers.

and this is a just minor annoyance in the whole stinking pile of mess that is branching/merging in subversion.

prog · on April 2, 2010

"Enterprise-class centralized version control for the masses" is the right way to go for svn IMO. With fast reliable networks, if svn implements the said roadmap features then existing organization probably won't need to move to DVCSes.

Distributed model works really well with open source. So I suppose svn has chosen its niche. I know I would have been happier using svn over ClearCase so many times.

Kadin · on April 2, 2010

So when is it going to start preserving filesystem modification times? That's only been on the TODO list for what, close to ten years?

I'd really like to see that (which has existed in branches) before some sort of massive effort to keep up with the DVCS Joneses.

Brendon · on April 2, 2010

They don't need to go the dvcs route, just get merging working.

wanderr · on April 3, 2010

I agree. They can keep the centralization, distributedness isn't all that useful for in-house development and leads to problems when people forget to push after committing. The real power of git is super cheap, super easy branching and merging. If that existed in svn, we would never have switched.