Migrating Cafu to distributed version control – Part 2

News about the Cafu Engine. Subscribe to the related ImageCafu News feed to stay informed.
Locked
User avatar
Carsten
Site Admin
Posts:2170
Joined:2004-08-19, 13:46
Location:Germany
Contact:
Migrating Cafu to distributed version control – Part 2

Post by Carsten » 2012-10-18, 22:56

In "Migrating Cafu to distributed version control – Part 1", I outlined the fundamental considerations for the migration. This post continues the subject with the more specific and technical details.


Goals and Requirements

Specifically for Cafu, what are the goals and requirements when converting to Git?

As a first step, our source code repository must technically be converted from Subversion to Git. I wanted the conversion to be done in a careful, accurate and complete manner: It should include all branches, vendor branches, tags and authors, and the proper and complete merge history. In fact, I wanted the resulting Git repository to look as if we had used Git right from the start.

Conversions like this are generally well described in the documentation and books about Git, but it turned out that our use of Vendor Branches, a normal and useful feature in Subversion, was very difficult to migrate to Git, and that only little related documentation can be found on the internet. Vendor branches are important to us because we use them to manage our external libraries.

Besides my normal work, I spent a lot of time on this, on and off for several months, slowly putting the related pieces together. I also described the problem on the git-users mailing list in thread "Importing Subversion vendor-branches to Git". The thread provides both a good technical summary as well as the remaining bits that I was still missing at that time. This post is the synopsis of the gathered results.

Secondly, as the issue tracker is closely related to the repository, it needs to be updated accordingly. Options include:
  1. stay with Trac (but update it to work with Git),
  2. migrate it to the issue tracker provided with the Cafu repository at BitBucket,
  3. migrate it to Atlassian JIRA.
The first option would preserve the full existing flexibility of Trac, keeping us a certain degree of independence, and not require getting used to something else.
The second option would probably be the least complex and the most comfortable, but it remains to be determined if the BitBucket issue tracker is powerful enough for our needs. You can see a BitBucket issue tracker live at the BitBucket site itself.
The third option would be the most powerful, but it might as well overwhelm us.

I've not yet formed an opinion about it though, much less a decision. Fortunately, the migration of the issue tracker can be done largely independent from the migration of the repository, so that its progress does not stall the progress of the repository.


Migration Hotspots

While the bulk of the conversion is flawlessly and quickly done by the git svn ... commands, I found plenty of occasions where manual tweaking of the process, or post-processing and clean-up work was necessary in order to achieve the desired result. This is especially true whenever the Subversion source repository deviates from the classic "trunk, branches, tags" layout, or subtleties of Subversion merges prevent proper automatic conversion to Git.

In this section, I list the issues that I found the most prominent (from a Git learners perspective), along with the solutions that I eventually applied.

Branches outside branches/

If some branches are according to Subversion repository standard layout in branches/, but more branches are elsewhere, or if standard layout was never used and the branches are arbitrarily scattered across the Subversion repository, it is not immediately clear if and how these extra branches can be accounted for so that they are properly imported into Git. The solution is to split the call to git svn clone into this sequence:

Code: Select all

> git svn init https://srv7.svn-repos.de/dev123/projects/cafu -s Cafu
> cd Cafu
> git config svn.authorsfile ../authors.txt
> git config --add svn-remote.svn.fetch "vendor:refs/remotes/vendor"
> git svn fetch
The next to last line causes the subsequent fetch to load the directory vendor/ as a Git branch, as if it was another branch in branches/.

Missing Merges

The converted commit history sometimes misses merges where merges were performed in Subversion:

Code: Select all

    ------B-----D---- master
         /
    ----A-----C------ pristine
A was merged into master, yielding B, and the merge is properly reproduced in Git.
C was merged into master as well, yielding D, but only in Subversion. In Git, the merge is missing.

Among other reasons, this can happen if in Subversion the merge was performed not at the top directory level, but as a "partial" merge from subdirectory to subdirectory, e.g. from forum/themes/firenzie in pristine directly to the same directory in master.

The solution for such cases is to use the .git/info/grafts file, and to "fix" its results with

Code: Select all

> git filter-branch --tag-name-filter cat -- --all
The --tag-name-filter cat part makes sure that attached tags are rewritten as well.
If rewriting the commits succeeded and the result is as desired, the grafts file and the references to the original commits should be deleted:

Code: Select all

> rm .git/info/grafts
> rm -rf .git/refs/original/
The next call to git svn fetch will automatically rebuild the rev_map that is needed for continued bidirectional communication with the source Subversion repository.

Fixing Tags

As Subversion treats tags exactly like branches, after the conversion to Git the Git branches that should be tags must be fixed manually. A good solution is described by Haenel and Plentz in their book, but it unfortunately only works with lightweight tags and thus cannot account for the tag message, which in our case is a longer text. The best solution that I have found that works with annotated tags is from this Atlassian blog post, to which I however had to make small modifications to work as desired:

Code: Select all

> type convert_tags.sh
#!/bin/sh
# CF: from http://blogs.atlassian.com/2012/01/moving-confluence-from-subversion-to-git/ with small modifications.
# Based on https://github.com/haarg/convert-git-dbic
set -u
set -e

git for-each-ref --format='%(refname)' refs/remotes/tags/* | while read r; do
tag=${r#refs/remotes/tags/}
# CF: Note the ^ in the next line: We create the converted tag at the *parent* of the original tag.
sha1=$(git rev-parse "$r^")

commiterName="$(git show -s --pretty='format:%an' "$r")"
commiterEmail="$(git show -s --pretty='format:%ae' "$r")"
commitDate="$(git show -s --pretty='format:%ad' "$r")"
# Print the raw commit body (commit message).
git show -s --pretty='format:%B' "$r" | \
env GIT_COMMITTER_EMAIL="$commiterEmail" GIT_COMMITTER_DATE="$commitDate" GIT_COMMITTER_NAME="$commiterName" \
git tag -a -F - "$tag" "$sha1"
echo "Tag: ${tag} sha1: ${sha1} using '${commiterName}', '${commiterEmail}' on '${commitDate}'"

# Remove the svn/tags/* ref
git update-ref -d "$r"
done
Move to Subdirectory

Before I could fix missing merges in our (partially) converted Cafu repository, I had to move the contents of all commits in the "vendor" branch into a subdirectory. The documentation for git filter-branch has a related example, which I modified according to this discussion, yielding:

Code: Select all

> git filter-branch --index-filter '
      rm -f "$GIT_INDEX_FILE"
      git read-tree --prefix=ExtLibs/ "$GIT_COMMIT"
  ' refs/heads/vendor
Vendor Branches

In our Subversion repositories, we make use of Vendor Branches, a normal and very useful feature in Subversion that is used to manage "external" software. Vendor branches are however very difficult to migrate to Git, and only very little related documentation can be found on the internet.
Possible solutions are:
  1. Git submodules,
  2. Git subtrees,
  3. normal Git branches.
Git submodules are mentioned relatively frequently, but they really do not seem to be a good fit for vendor branches. We don't consider them any further for the reasons detailed in the "Importing Subversion vendor-branches to Git" thread.

Git subtrees are looking very interesting and well suited to the problem, and I spent a lot of time digging into them. There is a subtree extension that is likely integrated into the Git core soon, and Jakub Suder describes a solution using it that we might have adopted (without the --squash).

Using normal Git branches as vendor branches is beautifully explained in this blog post by Dominic Mitchell. In fact, our website always was structured in "live" and "pristine" branches right from the start, and mapping these 1:1 to normal Git branches was straightforward and a clear choice (but still required the "Branches outside branches/" and "Missing Merges" facilities above).

For the vendor branches in Cafu, the matter was less clear: candidates were Git subtrees or again normal Git branches. As mentioned before, I posted a detailed and complete description of the problem in thread "Importing Subversion vendor-branches to Git".

Eventually, I opted for the "normal Git branches" approach (even though it required the "Move to Subdirectory" step from above), because it is the most simple, clearest approach that requires no "extras" at all, neither for the DVCS nor for its users, and as a side effect we keep the door open to a future migration to another VCS such as Mercurial.


Migration Details

I give the exact technical steps of converting the Cafu Subversion repository to Git in a comment to this post, in order to keep this prose text readable and clear.


The next steps

Our official Git repository of the Cafu Engine is now available at:
Naturally, we will not immediately abandon the Subversion repository, but enter a gradual transition period where everyone can make the switch at a comfortable pace, and where we can deal with details like the issue tracker.

Personally, for a short while I expect to continue working mainly with Subversion, updating the Git repository in a separate step.
Thereafter, I'll probably switch to work mainly with Git, but continue to update the Subversion repository in a separate step.
Only when all users and all technical indications clearly suggest that we can do entirely without Subversion, will the Subversion repository finally switch off.

:up:
Best regards,
Carsten
User avatar
Carsten
Site Admin
Posts:2170
Joined:2004-08-19, 13:46
Location:Germany
Contact:

Re: Migrating Cafu to distributed version control – Part 2

Post by Carsten » 2012-10-18, 23:28

Migration Details

As promised above, here are the exact steps that I took to convert the Cafu repository from Subversion to Git.

The steps are scarcly documented, and must therefore be read in connection with: Normally initialize the repository:

Code: Select all

> git svn init https://srv7.svn-repos.de/dev123/projects/cafu -s Cafu
> cd Cafu
> git config svn.authorsfile ../authors.txt
Load directory vendor/ like any other branch, then start the import:

Code: Select all

> git config --add svn-remote.svn.fetch "vendor:refs/remotes/vendor"
> git svn fetch
r465 = dce24e34236c70b772d4d857ee20cf5f283ac9e6 (refs/remotes/vendor)
r647 = a8bab8439d18a27eae5ebece69775c38224dfd01 (refs/remotes/trunk)
So far, we only had local branch "master" (for "remotes/trunk"). Add local branch "vendor" (for "remotes/vendor"), then delete the now unused "remotes/vendor" and make sure that a subsequent git svn fetch won't fetch it again:

Code: Select all

> git branch vendor remotes/vendor
> git branch -d -r vendor
> git config --unset svn-remote.svn.fetch "vendor:refs/remotes/vendor"
Delete the remaining remote branch "remote/cafu_to_wx", and make sure it won't get re-fetched:

Code: Select all

> git branch -d -r cafu_to_wx
> git config --unset svn-remote.svn.branches
Move all trees in "vendor" into subdirectory ExtLibs/:

Code: Select all

> git filter-branch --index-filter '
      rm -f "$GIT_INDEX_FILE"
      git read-tree --prefix=ExtLibs/ "$GIT_COMMIT"
  ' refs/heads/vendor
Optionally, we can manually check this with:

Code: Select all

> git checkout vendor
> dir
> gitk [--date-order]
> git checkout master
Clean-up the backup references of git filter-branch:

Code: Select all

> rm -rf .git/refs/original/
Determine where merges are missing, and create the .git/info/grafts file to supply them.
An auxiliary file that has the commit graph in easily printable form can be helpful.
Option --date-order is important in order to obtain a predictable, consistent output that is well suited to "see" where merges must be supplemented.

Code: Select all

> git log --graph --oneline --decorate --all --date-order | cut -c 1-72 > graph.txt
> edit .git/info/grafts
e70ee201153737a745b0428df0e5a4e9eb526cc3 c1ded9ed1d22eb88c5393b073e5f3d4eebbf8fed dce24e34236c70b772d4d857ee20cf5f283ac9e6
25c67e6b1868f57c80cf3e56cf165c308dd09ebe 19646f13d4f75a28f430b1616b679b3b5c2ece26 12f5ff2c21e460478ef80bfa060f192121144357
19646f13d4f75a28f430b1616b679b3b5c2ece26 715bb444d638eb7be0abd5e9cad3d21dd5ae23a8 0a7c586fddfb337d898a27a2c15bbdd2e12409a0
4eb4a93de6c9a11c0f363e9f8246f7c8df14d066 de5369f086baa4bbeb7c61116606ec85d549ace4 be783f413e4b2e22e67d450da0ba5c51f6157263
7edea9472a326f2931b933098f6b9655307bf839 c0507e7e2af9d99ec4e54d3bc7fd23f9d36f3949 092a05153c7ee89d06619ba2c9e5065a00ad3e2e
98ba03a8e6d915feb1ef56ac0c647c4a08dbc27b f197b0f3a63ba2f34799fb8f2cfeff1104b5f966 b00a0c8f1c260382f01d0d71ed993bf642bdea43
e5bae40e5909fad94c24c4e1a2063504758ab6dc c0f1172797c8a0146e6dd54fa3eec674beae4af0 8dd3949266c52eba532d0a0f9b68546b8d95124b
c0f1172797c8a0146e6dd54fa3eec674beae4af0 90a14b8ce6054e81236378795df69aa0f48f98f4 1910763c71cb0a19b0c4673ba4b09f7ef2a42d60
b9d8abe94fcdf7054f6ecb1ff96c3e471b17c75b 57bbb58016166a3b6d9c28576c86c97ce1cece2b 5d7f15107a348ca0cfede59209adc362cf1793d5

925448139e2010cb780b8a06613fe49e80bfa9b0 bbaddd111fda9102e9ebe14cb190faaeb012b1b8 f968ede6aa467a03c3b4ec8016e9a3144d22478f
00318436e239a2bfaf67c626f8699e26e5ce5830 fb636995b06866890bcfb041d01b342ca38be210 6cd1fccb252ab369f7d1dcd6aff3cd4255711abf

b9660a27f060924ba294246072cd9a3cbe66af08 d248308eb665bbd50ea26cd9e359475bebc10a4a 0c542f3de3649680f89ba2350f9e22239b73b99e
d81450be834901dd98a400f094f1dd17df84b0a0 ecad011089e569702f8859cb5bfc4df94517a1c9 804c0de04fed21c991ea52ad8e93396b6e0f4f4b

1746098d46b1dbee1cb608dbf4bfa5f5e8be2538 0aceb927f7e1a1437fdc87d6107d81762098e282 1d8e9d557f23788631bf6638852b345832f2fe51
dca073809a3039105d2f1ad4f4e371d15168400f 9c6d9e02ce19b5ee54a1f925ec5d5922c7225c8f
Fixate, clean-up, check:

Code: Select all

> git filter-branch --tag-name-filter cat -- --all
new r647 = c625b97e8735c375593458228ba1a400e7137c45
> rm .git/info/grafts
> rm -rf .git/refs/original/
> gitk --all --date-order
Convert the Subversion tags into Git tags:

Code: Select all

> ../convert_tags.sh
> git config --unset svn-remote.svn.tags
> git tag -a -m $'Up to this revision the project was solely managed in Subversion.\nIt was migrated to Git in October 2012.' svn-to-git
Re-build the rev_map:

Code: Select all

> git svn fetch
Initial upload to public server:

Code: Select all

> git remote add origin ssh://git@bitbucket.org/cafu/cafu.git
> git push -u origin --all
> git push origin --tags
> cd ..
Best regards,
Carsten
User avatar
Carsten
Site Admin
Posts:2170
Joined:2004-08-19, 13:46
Location:Germany
Contact:

Re: Migrating Cafu to distributed version control – Part 2

Post by Carsten » 2012-10-18, 23:33

For completeness, the tools/ directory was converted into its own Git repository with these steps:

Code: Select all

> git svn init https://srv7.svn-repos.de/dev123/projects --trunk "tools" Tools
> cd Tools
> git config svn.authorsfile ../authors.txt
> git svn fetch
r627 = 52aa0bdb7f974faa06b18b872232539aeb649343 (refs/remotes/trunk)
> git tag -a -m $'Up to this revision the project was solely managed in Subversion.\nIt was migrated to Git in October 2012.' svn-to-git
> gitk --all --date-order
> git remote add origin ssh://git@bitbucket.org/cafu/tools.git
> git push -u origin --all
> git push origin --tags
> cd ..
Best regards,
Carsten
Locked

Who is online

Users browsing this forum: No registered users and 1 guest