Collating repositories or grafting earlier history with Git

Let’s add another arrow to our already full quiver of version control tools and techniques. Do you know that the Linux kernel you clone normally contains only a part of its entire history? If you need access to its uninterrupted evolution since the first commit you have to “graft” a few separate repositories together chronologically. In this post I’d like to show you how it works and why would you want to do that with your projects.

Why?

There are a few reasons why you might need to collate histories from different repositories. Let me name a few:

(You can solidify all the above scenarios later with filter-branch to make the changes permanent).

What are Git grafts?

Git has a local – per-repository – mechanism to change and specify explicitly the parent of existing commits: These are called Grafts. In a repository they live in file “.git/info/grafts” (check the Git repository layout manpage for details).

This feature has been available for a long time in Git: it has the drawback that you have to always setup Grafts locally for each repository. To overcome this problem – and more – a new command is available since version 1.6.5git replace, which as the name implies is capable to replace any object with any other object. This command has the added benefit to track these swaps via refs which you can be push and pull between repositories.

How is the Linux kernel split?

From the Git Wiki:

When Linus started using Git for maintaining his kernel tree there didn’t exist any tools to convert the old kernel history. Later, when the old kernel history was imported into Git from the bkcvs gateway, grafts was created as a method for making it possible to tie the two different repositories together.

To re-assemble the complete kernel history you need these three repositories:

The syntax of the Grafts file in “.git/info/grafts” is simple: each line lists a commit and it’s fake parent using the SHA-1 identifiers. So to re-assemble the full history of the Linux kernel add the following grafts to .git/info/grafts:


1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 e7e173af42dbf37b1d946f9ee00219cb3b2bea6a

7a2deb32924142696b8174cdf9b38cd72a11fc96 379a6be1eedb84ae0d476afbc4b4070383681178

With these grafts, you can get a complete and continuous history of the kernel since 0.01. More info on the process here and here.

How to do it yourself?

So now that you have a bit of background let me guide you through the process using a sample repository, so that you can replicate it yourself:


git clone git@bitbucket.org:nicolapaolucci/starwars-summary.git

9c5045e (HEAD -> restarted, origin/restarted, master) Add tiny .gitignore [Nicola Paolucci]

08431c2 Add README.md [Nicola Paolucci]

e287529 Add chapter markdown files [Nicola Paolucci]

451b911 Word wrap summary at 80 characters [Nicola Paolucci]

6a4ecd3 Episode VI: Return of the Jedi [Nicola Paolucci]

a5aa482 Episode V: The Empire Strikes Back [Nicola Paolucci]

7795c6a Restarted repository [Nicola Paolucci]

56eacfe (tag: initial) [initial] empty commit [Nicola Paolucci]

git checkout -b legacy origin/legacy

git rev-parse --verify legacy

84abb39d9aab234dfba2e41f13f693fa5edbfe22

The resulting id is the last commit of the legacy branch. Now let’s retrieve the first commit of the restarted project in the restarted branch:


git checkout -b restarted origin/restarted

git rev-list master | tail -n 1

56eacfe37267edd674fba5ceb66395891a34f7cc

This id is the first commit in branch restarted. Now we want to “graft” the last commit in legacy to the restarted branch replacing the first commit there:


git replace "56eacfe37267edd674fba5ceb66395891a34f7cc" "84abb39d9aab234dfba2e41f13f693fa5edbfe22"

To verify it worked you can check that folder .git/refs/replace contains the correct graft:


cat .git/refs/replace/56eacfe37267edd674fba5ceb66395891a34f7cc

84abb39d9aab234dfba2e41f13f693fa5edbfe22

And in fact git log now shows the entire collated history:


9c5045e (HEAD -> restarted, origin/restarted, master) Add tiny .gitignore [Nicola Paolucci]

08431c2 Add README.md [Nicola Paolucci]

e287529 Add chapter markdown files [Nicola Paolucci]

451b911 Word wrap summary at 80 characters [Nicola Paolucci]

6a4ecd3 Episode VI: Return of the Jedi [Nicola Paolucci]

a5aa482 Episode V: The Empire Strikes Back [Nicola Paolucci]

7795c6a Restarted repository [Nicola Paolucci]

56eacfe (tag: initial, replaced) Episode III: Revenge of the Sith [Nicola Paolucci]

75b32cf Episode II: Attack of the Clones (2) [Nicola Paolucci]

5aa055b Episode II: Attack of the Clones [Nicola Paolucci]

d10384f Episode I: The Phantom Menace [Nicola Paolucci]

70df805 Outline of the story [Nicola Paolucci]

Because this replacement is stored in a ref, we can push it and share it with the team!


git push origin 'refs/replace/*'

Simply fantastic. Credits for helping me put together the instructions go to this fantastic SO post.

Related Git articles

If you’re into more Git materials, before I let you go let me suggest a couple further readings:

Conclusions

That’s it for now, I hope you found this technique interesting or useful for your projects! In any case if you liked this why not follow me at @durdn or my awesome team at @atlassiandev? (Clip icon credit Thomas Helbig from the Noun Project).

Exit mobile version