Switch gettext translated language with original language
Asked Answered
U

3

12

I started my PHP application with all text in German, then used gettext to extract all strings and translate them to English.
So, now I have a .po file with all msgids in German and msgstrs in English. I want to switch them, so that my source code contains the English as msgids for two main reasons:

  1. More translators will know English, so it is only appropriate to serve them up a file with msgids in English. I could always switch the file before I give it out and after I receive it, but naaah.
  2. It would help me to write English object & function names and comments if the content text was also English. I'd like to do that, so the project is more open to other Open Source collaborators (more likely to know English than German).

I could do this manually and this is the sort of task where I anticipate it will take me more time to write an automated routine for it (because I'm very bad with shell scripts) than do it by hand. But I also anticipate despising every minute of manual computer labour (feels like an oxymoron, right?) like I always do.

Has someone done this before? I figured this would be a common problem, but couldn't find anything. Many thanks ahead.

Sample Problem:

<title><?=_('Routinen')?></title>

#: /users/ruben/sites/v/routinen.php:43
msgid "Routinen"
msgstr "Routines"

I thought I'd narrow the problem down. The switch in the .po-file is no issue of course, it is as simple as

preg_replace('/msgid "(.+)"\nmsgstr "(.+)"/', '/msgid "$2"\nmsgstr "$1"/', $str);

The problem for me is the routine that searches my project folder files for _('$msgid') and substitutes _('msgstr') while parsing the .po-file (which is probably not even the most elegant way, after all the .po-file contains comments which contain all file paths where the msgid occurs).


After fooling around with akirk's answer a little, I ran into some more problems.

  1. Because I have a mixture of _('xxx') and _("xxx") calls, I have to be careful about (un)escaping.
    • Double quotes " in msgids and msgstrs have to be unescaped, but the slashes can't be stripped, because it may be that the double quote was also escaped in PHP
    • Single quotes have to be escaped when they're replaced into PHP, but then they also have to be changed in the .po-file. Luckily for me, single quotes only appear in English text.
  2. msgids and msgstrs can have multiple lines, then they look like this
    msgid = ""
    "line 1\n"
    "line 2\n"
    msgstr = ""
    "line 1\n"
    "line 2\n"
  3. plural forms are of course skipped at the moment, but in my case that's not an issue
  4. poedit wants to remove strings as obsolete that seem successfully switched and I have no idea why this happens in (many) cases.

I'll have to stop working on this for tonight. Still it seems using the parser instead of RegExps wouldn't be overkill.

Undefined answered 15/1, 2011 at 14:51 Comment(4)
tricky, not only would you need to switch the entries in the gettext files, you would also need to replace all strings in your code.Byebye
@tharkun: yes of course, that's what I'd need to do, but that does not seem so tricky to me. I think I could do it with a PHP string, but not with the shell. in the simplest way one would just parse (or search) the .po-file for msgids and strs and then search and replace all the files in a folder for that string. I included the switch in the post, to narrow the problem down.Undefined
I think you are on the right track here. The trick would be to make sure the regEx that you use doesn't inadvertently change actual source. Don't forget the regEx would also need to handle single and double quotes calls to _(...). Good luck.Alinaaline
@Yzmir Ramirez I want to change the actual source files. For me the trick is properly parsing the po-file to feed my search&replace script files and search strings and that's where I'm stuck.Undefined
S
2

See http://code.activestate.com/recipes/475109-regular-expression-for-python-string-literals/ for a good python-based regular expression for finding string literals, taking escapes into account. Although it's python, this might be quite good for multiline strings and other corner cases.

See http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/poswap.html for a ready, out-of-the-box base language swapper for .po files.

For instance, the following command line will convert german-based spanish translation to english-based spanish translation. You just have to ensure that your new base language (english) is 100% translated before starting conversion:

poswap -i de-en.po -t de-es.po -o en-es.po

And finally to swap english po file to german po file, use swappo: http://manpages.ubuntu.com/manpages/hardy/man1/swappo.1.html

After swapping files, some manual polishing of resultant files might be required. For instance headers might be broken and some duplicate texts might occur.

Shimmery answered 8/4, 2011 at 16:41 Comment(1)
I've posted a Python script to swap of source/target lanaguages in PO files. This might be of use for this case: mola.io/2013/09/17/swapping-languages-in-gettext-po-fileMissy
U
5

I built on akirk's answer and wanted to preserve what I came up with as an answer here, in case somebody has the same problem. This is not recursive, but that could easily change of course. Feel free to comment with improvements, I will be watching and editing this post.

$po = file_get_contents("locale/en_GB/LC_MESSAGES/messages.po");

$translations = array(); // german => english
$rawmsgids = array(); // find later
$msgidhits = array(); // record success
$msgstrs = array(); // find later

preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    $german = str_replace('\"','"',$match[1]); // unescape double quotes (could misfire if you escaped double quotes in PHP _("<a href=\"bla\">bla</a>") but in my case that was one case versus many)
    $english = str_replace('\"','"',$match[2]);


    $en_sq_e = str_replace("'","\'",$english); // escape single quotes

    $translations['_(\''. $german . '\''] = '_(\'' . $en_sq_e . '\'';
    $rawmsgids['_(\''. $german . '\''] = $match[1]; // find raw msgid with searchstr as key

    $translations['_("'. $match[1] . '"'] = '_("' . $match[2] . '"';
    $rawmsgids['_("'. $match[1] . '"'] = $match[1];

    $translations['__(\''. $german . '\''] = '__(\'' . $en_sq_e . '\'';
    $rawmsgids['__(\''. $german . '\''] = $match[1];

    $translations['__("'. $match[1] . '"'] = '__("' . $match[2] . '"';
    $rawmsgids['__("'. $match[1] . '"'] = $match[1];

    $msgstrs[$match[1]] = $match[2]; // msgid => msgstr
}


foreach (glob("*.php") as $file) {
    $code = file_get_contents($file);

    $filehits = 0; // how many replacements per file

    foreach($translations AS $msgid => $msgstr) {
        $hits = 0;
        $code = str_replace($msgid,$msgstr,$code,$hits);
        $filehits += $hits;

        if($hits!=0) $msgidhits[$rawmsgids[$msgid]] = 1; // this serves to record if the msgid was found in at least one incarnation
        elseif(!isset($msgidhits[$rawmsgids[$msgid]])) $msgidhits[$rawmsgids[$msgid]] = 0;
    }
    // file_put_contents($file, $code); // be careful to test this first before doing the actual replace (and do use a version control system!) 
    echo "$file : $filehits <br>"; 
    echo $code;
}
/* debug */ 
$found = array_keys($msgidhits, 1, true);
foreach($found AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";

echo "Not Found: <br>";
$notfound = array_keys($msgidhits, 0, true);
foreach($notfound AS $mid) echo $mid . " => " . $msgstrs[$mid] . "\n\n";

/*
following steps are still needed:
    * convert plurals (ngettext)
    * convert multi-line msgids and msgstrs (format mentioned in question)
    * resolve uniqueness conflict (msgids are unique, msgstrs are not), so you may have duplicate msgids (poedit finds these)
*/
Undefined answered 18/1, 2011 at 11:20 Comment(0)
S
2

See http://code.activestate.com/recipes/475109-regular-expression-for-python-string-literals/ for a good python-based regular expression for finding string literals, taking escapes into account. Although it's python, this might be quite good for multiline strings and other corner cases.

See http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/poswap.html for a ready, out-of-the-box base language swapper for .po files.

For instance, the following command line will convert german-based spanish translation to english-based spanish translation. You just have to ensure that your new base language (english) is 100% translated before starting conversion:

poswap -i de-en.po -t de-es.po -o en-es.po

And finally to swap english po file to german po file, use swappo: http://manpages.ubuntu.com/manpages/hardy/man1/swappo.1.html

After swapping files, some manual polishing of resultant files might be required. For instance headers might be broken and some duplicate texts might occur.

Shimmery answered 8/4, 2011 at 16:41 Comment(1)
I've posted a Python script to swap of source/target lanaguages in PO files. This might be of use for this case: mola.io/2013/09/17/swapping-languages-in-gettext-po-fileMissy
H
1

So if I understand you correctly you'd like to replace all German gettext calls with English ones. To replace the contents in the directory, something like this could work.

$po = file_get_contents("translation.pot");
$translations = array(); // german => english
preg_match_all('/msgid "(.+)"\nmsgstr "(.+)"/', $po, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    $translations['_("'. $match[1] . '")'] = '_("' . $match[2] . '")';
    $translations['_(\''. $match[1] . '\')'] = '_(\'' . $match[2] . '\')';
}
foreach (glob("*.php") as $file) {
    $code = file_get_contents($file);
    $code = str_replace(array_keys($translations), array_values($translations), $code);
    //file_put_contents($file, $code);
    echo $code; // be careful to test this first before doing the actual replace (and do use a version control system!)
}
Hybridism answered 17/1, 2011 at 14:23 Comment(3)
Yes, but even though I can of course give the po-file as a string, I need to search and replace a directory of php files, not a string. I'd also like to know in the end, which msgids couldn't be found (that would be function calls for plural forms and placeholders: so few that I could do them by hand). I was hoping that the gettext parser itself could be used somehow, after all it does something very similar already (parse php files and find msgids in specified function calls).Undefined
I'm not aware of a tool in the gettext distribution, you will have to do it by hand (which is not that tedious). I've changed my code to reflect this.Hybridism
I've fooled around with your script a bit pastebin.com/J7ipM1fy to see more easily which strings were found. Dealing with quotation marks and multi-line strings isn't trivial though and I'll update my question to reflect that.Undefined

© 2022 - 2024 — McMap. All rights reserved.