It’s time to stop using Subversion

Instapaper Text

It’s time to stop using Subversion

Since learning about the existence of source control systems about 8 years ago, I’ve always chosen Subversion for projects: it’s free, better than CVS, widely used and documented, and doesn’t completely suck. I’ve used other systems at times, as required by particular projects – notably SourceSafe (which is worse than Subversion) and Perforce (which is better, but costly) – but none of them were quite sufficient to pull me away from Subversion as a simple, obvious choice for all of my own projects.

Until now, that is. Subversion is no longer my first choice, and I reckon that six months from now, I’ll never need to use it on a project again. If you’re using Subversion, I think you should probably stop using it too.

So, what’s wrong with Subversion?

Committing = publishing

The first big problem with Subversion is that there’s no separation between creating and publishing a commit. If your changes aren’t present in the centrally accessible repository, then they’ve not been committed. This makes commits pretty slow.

It also makes them susceptible to network errors. If I’m committing some large assets to a repository hosted in the cloud, it’s quite often been the case that the connection has just died somewhere along the line. If, while the upload’s been happening, I’ve made more changes to my working copy, then it’s difficult to repeat the commit: I have to carefully pick out my new changes and keep them separate from my old changes.

Furthermore, because my commits are published as soon as they’re made, I have to be very careful about what I commit. As soon as the commit is published, other people may check it out and try building it. So, I mustn’t make any commits that would break the build and stop other people from working. (Private branches remedy this somewhat, but this opens up a whole separate set of problems around merging, that I’ll come to later).

Lastly, if I can’t access the repository server – if it’s gone down, or if I’m working remotely – then I can’t commit at all.

Merges

While Subversion evangelists always emphasized the ease with which you could create branches, they were a bit more quiet when it came to merging those branches. If you’re going to make serious use of branches on a project, there are certain things that the SCM needs to be able to support:

Files post-merge must correctly track history back along all merged branches.
Tracking which commits have been merged from one codeline to another, so that you don’t risk double-merging.
It should be easy to pull commits from one codeline to another.

Since version 1.5, Subversion technically supports all of these… but it’s all a bit fragile. For any folder that you’ve merged things into, an svn:mergeinfo property is set, recording the revision numbers of all the things that have been merged in (so that they’re not accidentally remerged, and so that future merges know where to pick up from). This more or less works – provided you don’t want to do things like merge code from other repositories – but there are many reports of it being somewhat prone to error. If your mergeinfo property has been corrupted somehow, it might be extremely hard to spot – and it’s not possible to regenerate the property by simply examining the codelines and commits.

File system operations

If you want to reorganize files in your working copy, you have to make sure that you involve Subversion in the appropriate manner, otherwise you risk doing things like interrupting your file history or committing changes to the wrong places. Filesystem operations like move and copy need to be ‘svn move’ and ‘svn copy.’

It’s not too hard to remember to do this if you’re working by hand, but increasingly we expect our tools to manage our files for us, and they don’t invite Subversion into the conversation.

What I’m using instead

I’ve switched to Git. It solves all of the above problems, as well as just generally being much, much faster.

When I first heard about Git, I didn’t really get it. Distributed version control seemed like a really bad idea – who’s copy of the code is authoritative? If bugs are logged against a particular build, how do you track down the code that was used to create it? The loss of things like sequential revision numbers seemed like a big drawback. And branching wasn’t something I ever used in Subversion, so I didn’t see why I should care about it in Git.

Why it’s better

It took me a while to realize the best way to think about this, which is: Screw the distributed bit. If you want to have a central, authoritative repository, you can still do that. It’s just that it becomes a matter of project policy, rather than a technical requirement; a particular repository is central and authoritative because you put it in the center and treat it as authoritative (or ‘bless’ it, in the terminology). You can mimic your present Subversion setup completely. What’s disconcerting is that you have to choose to do so, rather than simply having it fostered upon you.

Futhermore, Git explicitly models something that Subversion seems to miss: Every developer’s working copy is a private branch. The branch is created when they check out the code, it diverges from the central repository as others publish their changes, and it’s merged when the developer updates or commits their working copy. By giving the developer an entire repository instance on their own machine to store that branch in, it can actually be treated as a real branch, with multiple commits, a proper name, and so on. Some people ask: what if developers don’t push their changes to anyone else? To which I respond: What if they don’t check in their changes at all?

A common criticism leveled against Git was its usability – Git originated as a set of command-line tools for the Linux kernel hackers, and that heritage persisted. This argument was reasonable a year ago, I think, but the tools have matured now. I’m using Tower is an excellent choice (Edit: And XCode 4, out today, includes Git integration out of the box). It’s true that I have dropped down to the CLI for some more involved tinkering – mostly relating to importing my Subversion repositories – but I don’t use it at all on a day-to-day basis.

There’s a little bit more terminology to learn – staging, pushing and pulling, rebasing, and so on – but it’s not that big a deal, and most of it’s intuitive. This isn’t a severe disadvantage over Subversion, anyway; I’ve seen claims that non-technical people can’t handle the jargon, but if you think that non-technical people aren’t already learning what terms like ‘checkout’ and ‘commit’ mean then you’re kidding yourself. Git’s UNIX-philosophy layers-of-hundreds-of-tiny-programs approach makes it very easy to build your own scripts and automation over the top to make things as easy as possible for your team, without forcing them to adapt a particular workflow.

Why it’s not

There are a couple of things I can see might be problems for people. I think they’re problems that can be circumvented or solved, either through project policy, or by waiting for Git to mature a little more, and they’re not deal breakers for me – but they’re worth considering.

Firstly, every developer will have a complete copy of the entire repository on their machine. Given how cheap storage is these days, I don’t think this is a very common issue, but I can imagine that some extremely large projects might find it prohibitive, especially if they’re storing lots of large assets. To give you some idea, one of my projects takes up about 900MB on disk (code + assets), and the git database adds another 430MB on top of that, for a total checkout size of 1.3GB; before I moved it to Git, its Subversion checkout was 1.85GB. As far as speed is concerned, though, Git’s speed barely drops for very large projects.

Secondly, because Git tracks files by their content rather than by their path information, it’s sometimes difficult to keep tracking history for binary files. Many changes that seem small – even changing a single pixel in a texture – can result in the content of the file changing dramatically when compressed, and if too much of the file has changed, Git might not recognize it as being the same file any more. It tries to be smart about this, and if you don’t change the filename then you’re usually OK, but occasionally it can break.

Thirdly, on the topic of binary files: while diffing and merging binary files is impossible just as much as under any SCM system, people tend to go for the next best thing, which is locking files to make sure that at least people won’t waste time making changes that can’t be merged – but that’s not something you can do out of the box with Git, as there’s no authority that can tell you which files are and are not locked. Most of the time this won’t be a big deal – most of the teams I know have fairly strong ownership of art assets, so you wouldn’t really expect two different people to be changing the same model – but it’s likely to bite you at the worst possible times, such as when everybody is rushing to get something done for a milestone.

Per-file locking is part of a more general issue, which is that Git doesn’t have much in the way of an access control mechanism: you can grant read access without granting write access, but either the entire repository is accessible, or none of it is. Sometimes it’s convenient to be able to set up a project in which certain directories (for example, ‘shared’ or ‘engine’) can only be committed to by particular people.

Personally, I’ve found it’s not too hard to work around this by simply using multiple repositories: I’ve got multiple ‘section’ repositories that have particular permissions set up, and then a central ‘master’ repository that pulls branches from each section. Git’s got no problem that the branches I’m pulling in came from entirely separate repositories – another result of the distributed approach – and they merge fine, provided I make sure that the directory structures in each section are set up appropriately. It’s a little less neat than approaches I’ve previously used that employed svn:externals, but it’s not been a big problem. Git does also have support for ‘submodules,’ but after reading the documentation several times I’ve still not been able to figure out whether they’re quite suitable for my setup – if anyone has insight into how they can be used for this stuff, I’d be interested to hear it.

These features are presently lacking, but Git does have support for hooks at various points in the SCM workflow; the ‘update’ and ‘pre-commit’ hooks look quite promising to me as a way of rejecting commits that break either locking policies or ACL requirements. Hopefully we’ll soon see somebody provide readily usable scripts that use these hooks to add in the missing features.

(Given these problems, there’s a fair case to be made that while Git is fine for source code, it’s not the right place to keep assets – you might be better off using a system more deliberately designed for assets, like Alienbrain. Personally it’s not affecting me – our art is contractor-developed, so it tends to change infrequently. Using text-based asset formats where possible does help, though).

Conclusion

In the month or so that I’ve been using Git, I’ve generally been impressed. My workflow has improved; I’m now making lots of tiny, logically self-contained commits, instead of the huge blob changes I was committing before, and I’m staging, committing, and then continuing to work without waiting for any tedious uploading. It’s encouraged me to experiment with trying out new features and new ways of doing things, using local branches to quickly spin off a safe environment in which to play – as well as ensuring that I’ve got somewhere I can keep the results of the failed attempts if I want to refer back to them later.

I can imagine that I might find myself using Subversion occasionally in the next few months, as third-party application vendors work to bring their Git integration support up to the same standard as their Subversion integration. I don’t think it’ll be long now, though. Subversion as a concept, as an architecture for an approach to source control, has been refuted.

Subversion is dead. It’s just taking a bit of time for the rigor mortis to kick in.

(Update: Very interesting feedback from lots of people on this. Read the comments below; you may also want to check out the discussions over at reddit for more).

#AltDevBlogADay

Richard Fine
Follow @Superpig