Chinese localization not worked with PHP gettext extension as it works with English
Asked Answered
P

1

9

I'm already localized a website from Russian to English with PHP and gettext just with wrapping all strings into __($string) function.

It works.

Here's the gists: https://gist.github.com/Grawl/ba8f39b8398791c6a67e

But it don't work with Chinese translation. I just added compiled .mo (and source .po) into locale/zh_CN/LC_MESSAGES/, visit /index.php?locale=zh_CN and don't see it translated at all.

What it wrong with Chinese?

Have I to use other language code or something?

I use zh_CN to map on Chinese like it done in WordPress.

I cannot understand why.


Update:

The problem was in HTML <meta> tag and charset going from server in Windows-1251. Chop russian PHP server.

After I set <meta charset="GBK"> and turned off AddDefaultCharset in .htaccess, Chinese localization finally started to work.

After all, I added these modifications:

.htaccess:

- AddDefaultCharset UTF-8
+ AddDefaultCharset off
+ RewriteRule ^cn index.php?locale=zh_CN&charset=GBK [L]

functions.php, included before <!DOCTYPE html>:

+ $charset=$_GET["charset"];
+ if(!isset($charset)) {
+   $charset="UTF-8";
+ }

head.php, the <head> tag content:

+ <meta charset="<?=$charset?>">

So, if I does not set charset into get request, it becomes UTF-8, otherwise it goes from get request. For Chiense I set it to GBK, like on Taobao.com, and browser sets up right charset.

But after all I just has cyrillic characters encoded in Chinese glyphs, character by character.

Like this: Сервис и услуги

Becomes this: 褋械褉胁懈褋 懈 褍褋谢褍谐懈

If you paste these Chinese characters into decoder app, chose GB2312 on left (one from Chinese charsets) and UTF-8 on right, you will have ?е?ви? и ??л?ги – some cyrillic characters corrupted but this is obviously an original string, because in translation I have more shorten 服务 for this phrase.

Help me please.


Update 2

I just forgot to set bind_textdomain_codeset(); to $domain, it was messages.

All works on unicode charset. All normal.

Preconcert answered 3/7, 2015 at 14:27 Comment(7)
Your gist doesn't show the Chinese message catalogue. So, impossible to guess. And for the system gettext library to pick it up, you'd often have to restart the mod_php/PHP-FPM process.Crimpy
@Crimpy yes, it's old gists when I done only English l10nIrrepressible
Do you have the Chinese locale installed...?Propertius
@Propertius did you read my question at all?Irrepressible
"Locale" as in the system libraries necessary for this locale. Not the .mo file.Propertius
@Propertius should I really ask server administrator to install something just to swap some strings? :(Irrepressible
It's necessary for the setlocale call to work, which gettext depends on, yes.Propertius
T
7

Summary

I was able to make this work without changing the <meta charset="..."> value away from utf-8. You should also be able to remove the AddDefaultCharset rule from your .htaccess and also remove the &charset=GBK from your RewriteRule. You need to make sure that your .po file is formatted and compiled correctly, and also make sure that server can find it.

Explanation/Example

Setting the <meta charset="..."> tag only tells the browser what character encoding is being used on the page. PHP still needs to know which file to select to replace strings. And in any case, although this documentation suggests otherwise, I think you can still use UTF-8 to do Chinese localization. Here is a simple working example I set up on my system:

<?php
    // initialize locale-related variables
    $locale     = $_GET['locale'] ?: 'en_US';
    $domain     = 'bridges';
    $locale_dir = dirname( __FILE__ ) . '/locale'; // using absolute path!

    // set up locale
    putenv( "LC_ALL=$locale" );
    setlocale( LC_ALL, $locale );
    bindtextdomain( $domain, $locale_dir );
    bind_textdomain_codeset( $domain, 'UTF-8' );
    textdomain($domain);
?><!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title><?= _( 'Localization Test' ) ?></title>
    </head>
    <body>
        <p><?= _( 'Hello' ) ?>!</p>
    </body>
</html>

My .po file which is located at ./locale/zh_CN/LC_MESSAGES/bridges.po looks like:

msgid ""
msgstr ""
"Project-Id-Version: 1.0\n"
"PO-Revision-Date: 2015-07-20\n"
"Last-Translator: Morgan Benton\n"
"Language-Team: Chinese\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Language: zh_CN\n"

msgid "Localization Test"
msgstr "本土化试"

msgid "Hello"
msgstr "您好"

According to a comment on the gettext() documentation, you should put the character encoding and other relevant headers inside your .po file, e.g.

"Content-Type: text/plain; charset=UTF-8\n"

You can check the syntax of your .po file by running the command msgfmt -c bridges.po -o bridges.mo from your terminal. It will warn you if it thinks anything is wrong with your .po file. As the commenter suggested, I think you do NOT need to have the Chinese system libraries installed.

P.S. I don't know if these Chinese translations are correct or not. This is just what Google Translate gave me! :)

Tanked answered 20/7, 2015 at 22:27 Comment(5)
You are showing me the way I'm started before I tried to make something strange with all these encodings and charsets. Actually all exactly the same, except absolute path. It don't works if I try to use something other than ./locale.Irrepressible
I generate my .mo file with Poedit and cannot do anything with .po if it is not valid. I tried to change encoding to utf-16 with IDE and open it in Poedit after but no, Poedit don't want to work with non-UTF-8.Irrepressible
Oh crap I just forgot to set bind_textdomain_codeset(); to $domain, it was messages. Thank you anyway.Irrepressible
Sure. It always is something small.Tanked
@Grawl "Poedit don't want to work with non-UTF-8" — this is simply not true, Poedit works perfectly fine with any common charset. You must have messed it up (such as lying about the actual file's charset in the Content-Type header).Warrigal

© 2022 - 2024 — McMap. All rights reserved.