Previously, I demonstrated a Debian packaging workflow using Git and I mentioned the possibility of a follow-up post; well, here it is: you want to use my workflow (or one that’s related) for a package that is currently maintained with Subversion on svn.debian.org and you’d like to keep the history during the conversion.
Make sure to read the previous post before this one.
I am again using the example of mdadm since its
Git
packaging repository is in a state of shambles and I want to
restart to get it right and import the history from the
previous Subversion
repository. What better way than to write a blog post as I do
so? Well, plenty actually. This kind of post isn’t
really made for a blog, and I have started work on setting up
ikiwiki on madduck.net, but it’s not yet ready, so
I’ll stick with the blog for now. I will make sure that links don’t
break as I move content over, so feel free to bookmark this…
Importing the package into Git
Thanks to git-svn, the initial step of getting your package imported into Git is a breeze:
$ git-svn clone --stdlayout --no-metadata \
svn+ssh://svn.debian.org/svn/pkg-mdadm/mdadm mdadm
Sit back and enjoy. If that command exits prematurely with an error such as the following:
Malformed network data: Malformed network data at /usr/local/bin/git-svn line 1029
then you should upgrade to a newer Git version, or have a look
here. If
your Git does not know --stdlayout then upgrade as
well (or use -T trunk -t tags -b branches
instead).
Sam Vilain notes that it is important to “get the
attribution right with the final SVN import - getting the authors
map right. I didn’t do that. If you look at the repository
resulting from the above command, you’ll notice strange commit
authors, such as madduck@some-unique-uuid-from-svn.
git-svn allows you to map these to real names with
real email addresses, which ensures that the attributions are good
for the whole world to see.
When done, switch to the repository and run git-branch
-r. As you’ll see, git-svn imported all SVN
branches and tags as remote branches. You need those if you want to
bidirectionally track the Subversion repository, but we are
converting, as you may have guessed by the
--no-metadata switch above.
Therefore, we resort to the Dinosaur method
of converting branches to tags, which I’ll simplify for
mdadm. We also just delete all remote branches after
tagging, since mdadm never used branches in the
SVN repository. Your mileage may vary.
git branch -r | sed -rne 's, *tags/([^@]+)$,\1,p' | while read tag; do
echo "git tag debian/$tag tags/${tag}^; git branch -r -d tags/$tag"
done
git branch -r | while read tag; do
echo "git branch -r -d $tag"
done
If that seems to work alright, then you can execute the commands.
Sam Vilain (again) hints me at
git-pack-refs and then to edit
.git/packed-refs with an editor. This certainly leaves
more room for errors but might be significantly
faster.
Cleaning up the SVN references
Even though we passed --no-metadata to
git-svn, it did leave some traces in
.git/, which we can now safely remove:
$ git config --remove-section svn-remote.svn
$ rm -r .git/svn
Setting things straight
You can skip this section unless you want to know a bit about how to fix up stuff with Git.
There was actually some nasty tagging errors leading up to the
2.5.6-9 release for etch and I could
never be bothered to fix those in SVN, but now I can
(I love Git!):
$ git tag -d debian/2.5.6-10 # never existed
$ git tag -f debian/2.5.6-8 2.5.6-8~2 # mistagged
$ git checkout -b maint/etch 2.5.6-8 # this is when we diverged
$ git apply < /tmp/mdadm-2.5.6-8..2.5.6-9.diff
$ git add debian/po/gl.po debian/po/pt.po debian/changelog
$ git commit -s
$ git tag debian/2.5.6-9
Now that that’s fixed, there is one other thing to worry about,
namely the very last commit to SVN, which obsoletes
the repository and points to the Git repository. But that’s not all
of it. I was also silly enough to include a fix in the
same commit. Let’s see what Git can do. Since the process
of obsoletion involves all but adding a file, we can simply
--amend the last commit and provide a new log
message:
$ git checkout master
$ git rm OBSOLETE debian/OBSOLETE
$ git commit --amend
Now the repository is in an acceptable state.
Making ends meet
The pkg-mdadm
effort on svn.debian.org only maintained the
./debian/ directory, separate from the upstream code,
and boy was that a bad idea. Just to give one example: think about
what’s involved in preparing a Debian-specific patch against the
upstream code… this has to end, and we can make it end right here;
let’s import upstream’s code (again not using his ADSL line, but
the upstream branch of the pkg-mdadm Git
repository; see the previous
post for details):
$ git remote add upstream-repo git://git.debian.org/git/pkg-mdadm/mdadm
$ git config remote.upstream-repo.fetch \
+refs/heads/upstream:refs/remotes/upstream-repo/upstream
$ git fetch upstream-repo
$ git checkout -b upstream upstream-repo/master
Now we have two unconnected ancestries in our repository, and
it’s time to join them together. The most logical way seems to be
to use the last upstream tag for which we have a Debian tag:
2.6.2.
For this, we branch off the corresponding Debian tag
(2.6.2-1) and merge upstream’s 2.6.2 tag
into the new branch. This will be a temporary branch Then, we
rebase (remember, nothing has been published yet) the master branch
on top of this temporary branch, before we end that branch’s short
life. The Debian tag stays where it is since it describes the state
of the repository at time of the release of
2.6.2-1.
$ git checkout -b tmp/join debian/2.6.2-1
$ git merge mdadm-2.6.2
$ git rebase tmp/join master
$ git branch -d tmp/join
It just so happens that the head of the SVN
repository, which is identical to the tip of our
master branch, corresponds to Debian release
2.6.2-2, so we tag it:
$ git tag debian/2.6.2-2
We are now also “born” in the sense that maintenance in Git has started. Let’s mark that point in history. There is no real reason I can foresee for this yet, but nonetheless:
$ git tag -s git-birth
Turning dpatch files into feature branches
We want to turn dpatch files into feature branches
and we somehow make it “proper”. We could branch, apply the patch,
delete the patch file, checkout master and delete the
patch file there as well, but that appears “improper” to me at
least; so instead, we’ll cherry-pick:
$ git checkout -b deb/conffile-location
$ debian/patches/01-mdadm.conf-location.dpatch -apply
$ git rm debian/patches/01-mdadm.conf-location.dpatch
$ git commit -s
$ git commit -s $(git ls-files --others --modified)
I should quickly intervene to make sure you are following. I am
making use of Git’s index here. Applying the patch makes the
changes in the working tree, but we did not tell Git that we want
those to be part of the commit just yet. Instead, we delete the
dpatch with git-rm, which automatically
registers the deletion with the index. Thus, the first
git-commit creates a commit which deletes the
dpatch, while the second git-commit
creates a commit with all the changes from the dpatch,
using git-ls-files to identify new and modified
files.
But for now, let’s move on. We have two commits in the
deb/conffile-location branch, and one of those is
relevant to the master branch, we cherry-pick it:
$ git cherry-pick deb/conffile-location^
If you’re confused, let me explain: our goal is to have a number
of feature branches, of which master is the one in
which most of ./debian/ is maintained. All the
branches later come together in the long-living build
branch, so deb/conffile-location will never be merged
back into master. However, once we applied the
dpatch to the feature branch, we can delete it from
there and the master branch. By cherry-picking, we
“import” the deletion to the master branch.
I repeat the same procedure for deb/docs, merging
all the documentation-related dpatches, but I’ll spare
you the details.
… and then Git let me down
In the next step, I found I had misunderstood Git merging: I
thought Git was smart, but Linus had his reasons for calling Git
the “stupid content tracker” (more on that later). Read on as I am
obsoleting dpatch files that upstream had merged:
99-*-FIX.dpatch.
For consistency, I wanted to cherry-pick each of the appropriate
upstream commits into the master branch along with
deleting the corresponding dpatch file. Here is one
example: 99-monitor-6+10-FIX.dpatch was obsoleted by
upstream’s commit 66f8bbb; the -x records
the original commit ID in the log:
$ git cherry-pick -x 66f8bbb
$ git rm debian/patches/99-monitor-6+10-FIX.dpatch
$ git commit -s -m"remove dpatch obsoleted by $(git rev-parse --short HEAD)"
I repeated the procedure for the other dpatch
files, removed the dpatch infrastructure, and then
went on to merge it all into build to build the
package.
The build branch is a long-living branch off
upstream, but which upstream? I’ll
fast-forward you past a
segfault problem with mdadm, which upstream
(thought to have) resolved with commit 23dc1ae after
2.6.3, but he had not yet released 2.6.4.
Looking at the commits between 23dc1ae and upstream’s
HEAD at the time, I decided to include them all and
snapshot 4450e59:
$ git fetch upstream-repo
$ git checkout upstream
$ git merge upstream-repo
$ git tag mdadm-2.6.3+200709292116+4450e59 4450e59
$ git checkout master
$ git merge --no-commit mdadm-2.6.3+200709292116+4450e59
$ dch -v mdadm-2.6.3+200709292116+4450e59-1
$ git add debian/changelog
$ git commit -s
And then I called poor-mans-gitbuild, which merges
master and then deb/* into
build. Here is when stuff blew up.
I’ll make a long story short (read my description of the problem and Linus’ answer if you want to know more): I thought Git was smart to identify merges common to both branches and do the right thing, but it turn out that Git does not care at all about commits, it only worries about content and the end result. In our case, unfortunately (or fortunately), the outcome meant a conflict because the upstream branch introduced a simple change (last hunk) in the lines surrounding the patch we cherry-picked, and Git can’t handle it.
The solution is not to cherry-pick, to cherry-pick
all commits touching the context of the
dpatch, or to simply merge upstream into
all out feature branches. In our case, the first is the easiest
solution and since importing dpatch files is a
one-time thing (thank $DEITY), I’ll leave it at
that.
Almost.
I have spent two days thinking about this more than I should have. And it was this point Linus made which made me appreciate Git even more:
Conflicts aren’t bad - they’re good. Trying to aggressively resolve them automatically when two branches have done slightly different things in the same area is stupid and just results in more problems. Instead, git tries to do what I don’t think anybody else has done: make the conflicts easy to resolve, by allowing you to work with them in your normal working tree, and still giving you a lot of tools to help you see what’s going on.
The end
This concludes today’s report. Importing the changes from the old Git repo, tagging and merging the branches is all covered in my previous post, or at least you’ll find enough information there to complete the exercise.
I would like to specifically thank Sam Vilain and Linus Torvalds
for their help in preparing this post, as well as the
#git/freenode inhabitants, as always.
If you are interested in the topic of using version control for
distro packaging, I invite you to join the vcs-pkg mailing
list and/or the #vcs-pkg/irc.oftc.net IRC
channel.
Also, if you are interested in Git in general, you can find a list of blog posts on the Git wiki.
NP: The Police: Zenyatta Mondatta

