Sunday, August 19, 2007

Expunging a problem file from Mercurial repo

Mercurial is almost the perfect version control system: fast, lean, distributed, easily extensible and reliable. It works by copying an entire repository, compressed, every time a clone needs to be made. This is much more efficient than it may sound, and the time it takes to transfer a Mercurial repo is quite comparable to the time it takes to checkout a Subversion repository.

The only time the system breaks down is if you lose your head and commit a large compressed database to the repository. Since I'm still learning how everything in Mercurial works, I did this accidentally, and made several commits before I realized my repo was ballooning out of control due to this one file with only 7 commits.

Here's how you recover.
  1. Make sure you fix this before anyone clones your repository from upstream. The following procedure renders your repository unusable to anyone working on their own clone of it; they will have to clone a copy of the results and start fresh.
  2. Get a list of all the commits that changed your file.
    $ hg log -M -r0:tip --template "{rev} {files}\n" goonmill/srd35.db.gz
    11 goonmill/srd35.db.gz
    52 goonmill/srd35.db.gz
    59 goonmill/srd35.db.gz
    99 goonmill/alter3.sql goonmill/srd35.db.gz
    115 goonmill/srd35.db.gz
    160 goonmill/srd35.db.gz srd35.odb
    189 goonmill/srd35.db.gz
  3. Export the entire repository as patches; use hg export to extract each one to a separate file.
    $ mkdir ../Goonmill-revs; hg export -g -o ../Goonmill-revs/%r:%n-of-%N $(hg log -M --template "{rev} ")
    (Hundreds of patch files created)
  4. Delete the ones that you found in step 2. The reason I had you print {files} in that step was so that you could check to see if any other files would be affected by the new hole in the history. For example, my commit 99 above changes another file. Instead of deleting that revision, I will edit the patch so only the patch to the sql script is committed.
  5. "hg init" a new repository and use "hg import Goonmill-revs/*" to recreate.
You now have a new repository with the same revision history as before, but with your one problem file not contained in the repo history. Next time, commit that database uncompressed, because Mercurial does efficient binary diffs that don't work very well on compressed files but work great on uncompressed structured binary files like a database.

No comments: