Convert git repository file encoding
Asked Answered
H

1

31

I have a large CVS repository containing files in ISO-8859-1 and want to convert this to git.

Sure I can configure git to use ISO-8859-1 for encoding, but I would like to have it in utf8.

Now with tools such as iconv or recode I can convert the encoding for the files in my working tree. I could commit this with a message like converted encoding.

My question now is, is there a possibility to convert the complete history? Either when converting from cvs to git or afterwards. My idea would be to write a script that reads each commit in the git repository and to convert it to utf8 and to commit it in a new git repository.

Is this possible (I am unsure about the hash codes and how to walk through the commits, branches and tags). Or is there a tool that can handle something like this?

Hoffman answered 15/6, 2012 at 14:5 Comment(2)
Yes, you can rewrite the history, but probably you shouldn't: You should never rewrite a repository, that you already pushed somewhere. My opinion is: iconv and a normal commit is the way to go.Alexei
Okay, thanks @KingCrunch. But since I newly create the git repository it is pushed to nowhere. And also I would accept to create a second repository with the utf8 encoding based on the history of the first. Which is basically the same instead that I wouldn't modify the existing repo.Hoffman
F
24

You can do this with git filter-branch. The idea is that you have to change the encoding of the files in every commit, rewriting each commit as you go.

First, write a script that changes the encoding of every file in the repository. It could look like this:

#!/bin/sh

find . -type f -print | while read f; do
        mv -i "$f" "$f.recode.$$"
        iconv -f iso-8859-1 -t utf-8 < "$f.recode.$$" > "$f"
        rm -f "$f.recode.$$"
done

Then use git filter-branch to run this script over and over again, once per commit:

git filter-branch --tree-filter /tmp/recode-all-files HEAD

where /tmp/recode-all-files is the above script.

Right after the repository is freshly upgraded from CVS, you probably have just one branch in git with a linear history back to the beginning. If you have several branches, you may need to enhance the git filter-branch command to edit all the commits.

Flinch answered 15/6, 2012 at 15:32 Comment(4)
Great! Currently the command is running on a test git repository. Indeed I've a lot of branches, I just checked the documentation, do I just have to append --all to filter all branches?Hoffman
For all others, git filter-branch --tree-filter /tmp/recode-all-files -- --all filters all branches.Hoffman
I am trying to use your answer but I get a recode-all-files: command not found. I am using a mac, it appears to have iconv installed, I don't know if I have to set additional setting on thatConsign
In case you also used ISO-8859-1 characters in commit messages, you can convert those as well: git filter-branch --msg-filter 'iconv -f iso-8859-1 -t utf-8' -- --allSappy

© 2022 - 2024 — McMap. All rights reserved.