home of the madduck/ blog/
Converting a package to Git

Previously, I demonstrated a Debian packaging workflow using Git and I mentioned the possibility of a follow-up post; well, here it is: you want to use my workflow (or one that's related) for a package that is currently maintained with Subversion on svn.debian.org and you'd like to keep the history during the conversion.

Make sure to read the previous post before this one.

I am again using the example of mdadm since its Git packaging repository is in a state of shambles and I want to restart to get it right and import the history from the previous Subversion repository. What better way than to write a blog post as I do so? Well, plenty actually. This kind of post isn't really made for a blog, and I have started work on setting up ikiwiki on madduck.net, but it's not yet ready, so I'll stick with the blog for now. I will make sure that links don't break as I move content over, so feel free to bookmark this…

Importing the package into Git

Thanks to git-svn, the initial step of getting your package imported into Git is a breeze:

$ git-svn clone --stdlayout --no-metadata \
    svn+ssh://svn.debian.org/svn/pkg-mdadm/mdadm mdadm

Sit back and enjoy. If that command exits prematurely with an error such as the following:

Malformed network data: Malformed network data at /usr/local/bin/git-svn line 1029

then you should upgrade to a newer Git version, or have a look here. If your Git does not know --stdlayout then upgrade as well (or use -T trunk -t tags -b branches instead).

Sam Vilain notes that it is important to "get the attribution right with the final SVN import - getting the authors map right. I didn't do that. If you look at the repository resulting from the above command, you'll notice strange commit authors, such as madduck@some-unique-uuid-from-svn. git-svn allows you to map these to real names with real email addresses, which ensures that the attributions are good for the whole world to see.

When done, switch to the repository and run git-branch -r. As you'll see, git-svn imported all SVN branches and tags as remote branches. You need those if you want to bidirectionally track the Subversion repository, but we are converting, as you may have guessed by the --no-metadata switch above.

Therefore, we resort to the Dinosaur method of converting branches to tags, which I'll simplify for mdadm. We also just delete all remote branches after tagging, since mdadm never used branches in the SVN repository. Your mileage may vary.

git branch -r | sed -rne 's, *tags/([^@]+)$,\1,p' | while read tag; do
  echo "git tag debian/$tag tags/${tag}^; git branch -r -d tags/$tag"
done

git branch -r | while read tag; do
  echo "git branch -r -d $tag"
done

If that seems to work alright, then you can execute the commands.

Sam Vilain (again) hints me at git-pack-refs and then to edit .git/packed-refs with an editor. This certainly leaves more room for errors but might be significantly faster.

Cleaning up the SVN references

Even though we passed --no-metadata to git-svn, it did leave some traces in .git/, which we can now safely remove:

$ git config --remove-section svn-remote.svn
$ rm -r .git/svn

Setting things straight

You can skip this section unless you want to know a bit about how to fix up stuff with Git.

There was actually some nasty tagging errors leading up to the 2.5.6-9 release for etch and I could never be bothered to fix those in SVN, but now I can (I love Git!):

$ git tag -d debian/2.5.6-10            # never existed
$ git tag -f debian/2.5.6-8 2.5.6-8~2   # mistagged
$ git checkout -b maint/etch 2.5.6-8    # this is when we diverged
$ git apply < /tmp/mdadm-2.5.6-8..2.5.6-9.diff
$ git add debian/po/gl.po debian/po/pt.po debian/changelog
$ git commit -s
$ git tag debian/2.5.6-9

Now that that's fixed, there is one other thing to worry about, namely the very last commit to SVN, which obsoletes the repository and points to the Git repository. But that's not all of it. I was also silly enough to include a fix in the same commit. Let's see what Git can do. Since the process of obsoletion involves all but adding a file, we can simply --amend the last commit and provide a new log message:

$ git checkout master
$ git rm OBSOLETE debian/OBSOLETE
$ git commit --amend

Now the repository is in an acceptable state.

Making ends meet

The pkg-mdadm effort on svn.debian.org only maintained the ./debian/ directory, separate from the upstream code, and boy was that a bad idea. Just to give one example: think about what's involved in preparing a Debian-specific patch against the upstream code… this has to end, and we can make it end right here; let's import upstream's code (again not using his ADSL line, but the upstream branch of the pkg-mdadm Git repository; see the previous post for details):

$ git remote add upstream-repo git://git.debian.org/git/pkg-mdadm/mdadm
$ git config remote.upstream-repo.fetch \
    +refs/heads/upstream:refs/remotes/upstream-repo/upstream
$ git fetch upstream-repo
$ git checkout -b upstream upstream-repo/master

Now we have two unconnected ancestries in our repository, and it's time to join them together. The most logical way seems to be to use the last upstream tag for which we have a Debian tag: 2.6.2.

For this, we branch off the corresponding Debian tag (2.6.2-1) and merge upstream's 2.6.2 tag into the new branch. This will be a temporary branch Then, we rebase (remember, nothing has been published yet) the master branch on top of this temporary branch, before we end that branch's short life. The Debian tag stays where it is since it describes the state of the repository at time of the release of 2.6.2-1.

$ git checkout -b tmp/join debian/2.6.2-1
$ git merge mdadm-2.6.2

$ git rebase tmp/join master
$ git branch -d tmp/join

It just so happens that the head of the SVN repository, which is identical to the tip of our master branch, corresponds to Debian release 2.6.2-2, so we tag it:

$ git tag debian/2.6.2-2

We are now also "born" in the sense that maintenance in Git has started. Let's mark that point in history. There is no real reason I can foresee for this yet, but nonetheless:

$ git tag -s git-birth

Turning dpatch files into feature branches

We want to turn dpatch files into feature branches and we somehow make it "proper". We could branch, apply the patch, delete the patch file, checkout master and delete the patch file there as well, but that appears "improper" to me at least; so instead, we'll cherry-pick:

$ git checkout -b deb/conffile-location
$ debian/patches/01-mdadm.conf-location.dpatch -apply
$ git rm debian/patches/01-mdadm.conf-location.dpatch
$ git commit -s
$ git commit -s $(git ls-files --others --modified)

I should quickly intervene to make sure you are following. I am making use of Git's index here. Applying the patch makes the changes in the working tree, but we did not tell Git that we want those to be part of the commit just yet. Instead, we delete the dpatch with git-rm, which automatically registers the deletion with the index. Thus, the first git-commit creates a commit which deletes the dpatch, while the second git-commit creates a commit with all the changes from the dpatch, using git-ls-files to identify new and modified files.

But for now, let's move on. We have two commits in the deb/conffile-location branch, and one of those is relevant to the master branch, we cherry-pick it:

$ git cherry-pick deb/conffile-location^

If you're confused, let me explain: our goal is to have a number of feature branches, of which master is the one in which most of ./debian/ is maintained. All the branches later come together in the long-living build branch, so deb/conffile-location will never be merged back into master. However, once we applied the dpatch to the feature branch, we can delete it from there and the master branch. By cherry-picking, we "import" the deletion to the master branch.

I repeat the same procedure for deb/docs, merging all the documentation-related dpatches, but I'll spare you the details.

… and then Git let me down

In the next step, I found I had misunderstood Git merging: I thought Git was smart, but Linus had his reasons for calling Git the "stupid content tracker" (more on that later). Read on as I am obsoleting dpatch files that upstream had merged: 99-*-FIX.dpatch.

For consistency, I wanted to cherry-pick each of the appropriate upstream commits into the master branch along with deleting the corresponding dpatch file. Here is one example: 99-monitor-6+10-FIX.dpatch was obsoleted by upstream's commit 66f8bbb; the -x records the original commit ID in the log:

$ git cherry-pick -x 66f8bbb
$ git rm debian/patches/99-monitor-6+10-FIX.dpatch
$ git commit -s -m"remove dpatch obsoleted by $(git rev-parse --short HEAD)"

I repeated the procedure for the other dpatch files, removed the dpatch infrastructure, and then went on to merge it all into build to build the package.

The build branch is a long-living branch off upstream, but which upstream? I'll fast-forward you past a segfault problem with mdadm, which upstream (thought to have) resolved with commit 23dc1ae after 2.6.3, but he had not yet released 2.6.4. Looking at the commits between 23dc1ae and upstream's HEAD at the time, I decided to include them all and snapshot 4450e59:

$ git fetch upstream-repo
$ git checkout upstream
$ git merge upstream-repo
$ git tag mdadm-2.6.3+200709292116+4450e59 4450e59

$ git checkout master
$ git merge --no-commit mdadm-2.6.3+200709292116+4450e59
$ dch -v mdadm-2.6.3+200709292116+4450e59-1
$ git add debian/changelog
$ git commit -s

And then I called poor-mans-gitbuild, which merges master and then deb/* into build. Here is when stuff blew up.

I'll make a long story short (read my description of the problem and Linus' answer if you want to know more): I thought Git was smart to identify merges common to both branches and do the right thing, but it turn out that Git does not care at all about commits, it only worries about content and the end result. In our case, unfortunately (or fortunately), the outcome meant a conflict because the upstream branch introduced a simple change (last hunk) in the lines surrounding the patch we cherry-picked, and Git can't handle it.

The solution is not to cherry-pick, to cherry-pick all commits touching the context of the dpatch, or to simply merge upstream into all out feature branches. In our case, the first is the easiest solution and since importing dpatch files is a one-time thing (thank $DEITY), I'll leave it at that.

Almost.

I have spent two days thinking about this more than I should have. And it was this point Linus made which made me appreciate Git even more:

Conflicts aren't bad - they're good. Trying to aggressively resolve them automatically when two branches have done slightly different things in the same area is stupid and just results in more problems. Instead, git tries to do what I don't think anybody else has done: make the conflicts easy to resolve, by allowing you to work with them in your normal working tree, and still giving you a lot of tools to help you see what's going on.

The end

This concludes today's report. Importing the changes from the old Git repo, tagging and merging the branches is all covered in my previous post, or at least you'll find enough information there to complete the exercise.

I would like to specifically thank Sam Vilain and Linus Torvalds for their help in preparing this post, as well as the #git/freenode inhabitants, as always.

If you are interested in the topic of using version control for distro packaging, I invite you to join the vcs-pkg mailing list and/or the #vcs-pkg/irc.oftc.net IRC channel.

Also, if you are interested in Git in general, you can find a list of blog posts on the Git wiki.

NP: The Police: Zenyatta Mondatta