How to internationalize a PHP third-party library
Asked Answered
V

3

12

Consider writing a PHP library, that will get published through Packagist or Pear. It is addressed to peer developers using it in arbitrary settings.

This library will contain some status messages determined for the client. How do I internationalize this code, so that the developers using the library have the highest possible freedom to plug in their own localization method? I don't want to assume anything, especially not forcing the dev to use gettext.

To work on an example, let's take this class:

class Example {

    protected $message = "I'd like to be translated in your client's language.";

    public function callMe() {
        return $this->message;
    }

    public function callMeToo($user) {
        return sprintf('Hi %s, nice to meet you!', $user);
    }

}

There are two problems here: How do I mark the private $message for translation, and how do I allow the developer to localize the string inside callMeToo()?

One (highly inconvenient) option would be, to ask for some i18n method in the constructor, like so:

public function __construct($i18n) {
    $this->i18n = $i18n;
    $this->message = $this->i18n($this->message);
}

public function callMeToo($user) {
    return sprintf($this->i18n('Hi %s, nice to meet you!'), $user);
}

but I dearly hope for a more elegant solution.

Edit 1: Apart from simple string substitution the field of i18n is a wide one. The premise is, that I don't want to pack any i18n solution with my library or force the user to choose one specifically to cater for my code.

Then, how can I structure my code to allow best and most flexible localization for different aspects: string translation, number and currency formatting, dates and times, ...? Assume one or the other appears as output from my library. At which position or interface can the consuming developer plug in her localization solution?

Viewpoint answered 10/12, 2013 at 19:32 Comment(0)
W
8

The most often used solution is a strings file. E.g. like following:

# library
class Foo {
  public function __construct($lang = 'en') {
    $this->strings = require('path/to/langfile.' . $lang . '.php');
    $this->message = $this->strings['callMeToo'];
  }

  public function callMeToo($user) {
    return sprintf($this->strings['callMeToo'], $user);
  }
}

# strings file
return Array(
  'callMeToo' => 'Hi %s, nice to meet you!'
);

You can, to avoid the $this->message assignment, also work with magic getters:

# library again
class Foo {
  # … code from above

  function __get($name) {
    if(!empty($this->strings[$name])) {
      return $this->strings[$name];
    }

    return null;
  }
}

You can even add a loadStrings method which takes an array of strings from the user and merge it with your internal strings table.

Edit 1: To achieve more flexibility I would change the above approach a little bit. I would add a translation function as object attribute and always call this when I want to localize a string. The default function just looks up the string in the strings table and returns the value itself if it can't find a localized string, just like gettext. The developer using your library could then change the function to his own provided to do a completely different approach of localization.

Date localization is not a problem. Setting the locale is a matter of the software your library is used in. The format itself is a localized string, e.g. $this->translate('%Y-%m-%d') would return a localized version of the date format string.

Number localization is done by setting the right locale and using functions like sprintf().

Currency localization is a problem, though. I think the best approach would be to add a currency translation function (and, maybe for better flexibility, another number formatting function, too) which a developer could overwrite if he wants to change the currency format. Alternatively you could implement format strings for currencies, too. For example %CUR %.02f – in this example you would replace %CUR with the currency symbol. Currency symbols itself are localized strings, too.

Edit 2: If you don't want to use setlocale you have to do a lot of work… basically you have to rewrite strftime() and sprintf() to achieve localized dates and numbers. Of course possible, but a lot of work.

Warnerwarning answered 13/12, 2013 at 8:17 Comment(8)
Thanks, interesting. Basically, the suggested loadStrings seems to be a good match to my needs. Do you know of any projects in the wild, how they deal with it? (Zend is exempted though, they have their own i18n lib.)Viewpoint
@Viewpoint I know a couple of closed source projects handling it this way, but I have to think hard about an open source project. Most are not localized at all and if they are, they use gettext…Warnerwarning
@Viewpoint So, why do you hesitate? There is gettext() which sucks for web applications and easy deployment. And there is this home-grown simple i18n/l10n solution. Why don't you simply use one of them?Warnerwarning
(I'm asking not because of the points, I really don't understand what your problem is)Warnerwarning
The problem with homegrown is, that it might not be suited for the i18n needs of consuming users. That’s basically what I’d like to evaluate: loading strings from a .ini, PHP array or whatever is fine and straight-forward, but will I create l10n obstacles for my users this way?Viewpoint
I updated the question to complicate matters :-( I18n is a wide field, and things like date formatting and the such are relevant, too. The question is really about getting i18n-ready for the lib user and not providing means of localization itself.Viewpoint
Thanks again for the detailed answer! I've just placed the bounty on it. I'm not yet accepting it, because (sorry for the late answer to your edit) setting the locale is something inherently bad in PHP (in some environments), so I’m not fully satisfied with that solution. (Though I'd love to use it, as much as I actually love gettext, too.)Viewpoint
@Viewpoint Thanks for the bounty, but that's not why I was asking :-) Hm… if you don't want to use the locale then you have to do everything by hand, using a hand-made version of sprintf and strftime. A lot of work…Warnerwarning
W
2

The basic approach is to provide the consumer with some method to define a mapping. It can take any form, as long as the user can define a bijective mapping.

For example, Mantis Bug Tracker uses a simple globals file:

<?php
    require_once "strings_$language.txt";
    echo $s_actiongroup_menu_move;

Their method is basic but works just fine. Wrap it in a class if you prefer:

<?php
    $translator = new Translator(Translator::ENGLISH); // or make it a singleton
    echo $translator->translate('actiongroup_menu_move');

Use an XML file instead, or an INI file, or a CSV file... whatever format of your liking, in fact.


Answering your later edits/comments

Yes, the above does not differ much from other solutions. But I believe there is little else to be said:

  • translation can only be achieved through string substitution (the mapping may take an infinite number of forms)
  • formatting number and dates is none of your concern. It is the presentation layer's responsibility, and you should just return raw numbers (or DateTimes or timestamps), (unless your library's very purpose is localisation ;)
Whittier answered 13/12, 2013 at 18:0 Comment(1)
Thanks for answering, but I’m especially interested in the consequences for users of my library down the path. What are the pitfalls these solutions might create? Apart from that, I fail to see how your concept is different from what I provided in the question or @Warnerwarning already mentioned. I’ll also update the question to extend to more i18n issues.Viewpoint
D
2

There's a main problem here. You don't want to make the code as it is right now in your question for internationalization.

Let me explain. The main translator is probably a programmer. The second and third might be, but then you want to translate it to any language, even for non-programmers. This ought to be easy for non-programmers. Hunting through classes, functions, etc for non-programmers is definitely not okay.

So I propose this: keep your source sentences (english) in an agnostic format, that it's easy to understand for everyone. This might be an xml file, a database or any other form you see it fits. Then use your translations where you need them. You can do it like:

class Example {
  // Fetch them as you prefer and store them in $messages.
  protected $messages = array(
    'en' => array(
      "message"  => "I'd like to be translated in your client's language.",
      "greeting" => "Hi %s, nice to meet you!"
      )
     );

  public function __construct($lang = 'en') {
    $this->lang = $lang;
    }

  protected function get($key, $args = null) {
    // Store the string
    $message = $this->messages[$this->lang][$key];
    if ($args == null)
      return $this->translator($message);
    else {
      $string = $this->translator($message);
      // Merge the two arrays so they can be passed as values
      $sprintf_args = array_merge(array($string), $args);
      return call_user_func_array('sprintf', $sprintf_args);
      }
    }

  public function callMe() {
    return $this->get("message");
  }

  public function callMeToo($user) {
    return $this->get("greeting", $user);
  }
}

Furthermore, if you want to use a small translation script I did, you can simplify it furthermore. It uses a database, so it might not have so much flexibility as you're looking for. You need to inject it and the language is set in the initialization. Note that the text is automatically added to database if not present.

class Example {
  protected $translator;

  // Translator already knows the language to translate the text to
  public function __construct($Translator) {
    $this->translator = $Translator;
    }

  public function callMe() {
    return $this->translator("I'd like to be translated in your client's language.");
  }

  public function callMeToo($user) {
    return sprintf($this->translator("Hi %s, nice to meet you!"), $user));
  }
}

It could be easily modified to use a xml file or any other source for translated strings.

Notes for the second method:

  • This is different than your proposed solution since it is doing the work in the output, rather than in the initialization, so no need to keep track of every string.

  • You only need to write your sentences once, in English. The class I wrote will put it in the database provided it's correctly initialized, making your code extremely DRY. That's exactly why I started it, instead of just using gettext (and the ridiculous size of gettext for my simple requirements).

  • Con: it's an old class. I didn't know a lot back then. Now I'd change a couple of things: making a language field, rather than en, es, etc, throwing some exceptions here and there and uploading some of the tests I did.

Darin answered 14/12, 2013 at 20:5 Comment(4)
Thanks for the answer. I looked at the code on Github, too. Unfortunately this solution doesn't work for me. I want to restrict my own requirements as much as possible while still allowing the consuming user to fully localize what is emitted by my code. I also updated the question to include things like date formatting. I fear it's way more complicated than whether to translate on output or during initialization.Viewpoint
With your update, I think it's too broad for SO actually. Either the code would require hundreds of lines or it's more of a conceptual question that is better asked in programmers. However, feel free to extend (and push back to github if you want) the class I pointed out to handle better localization.Darin
Another interesting question is, how'd you handle the currency? Since that's variable with time, you would need to use an external API, so it becomes more and more inconvinient to do all of your requeriments and it's not going to be simple.Darin
I'm really starting to question, if refusing to add third-party libs is a viable way. In Python, I'd just throw in pybabel and pytz as requirement, and all number and currency formatting woes are gone. (For the prize of added dependencies onto the client's code base.) However, it's well apparent to me, that it's not simple to do that, and I certainly don't want to re-invent any wheels in my lib, hence the question :-) It should simply fathom current practices on i18n in PHP libs.Viewpoint

© 2022 - 2024 — McMap. All rights reserved.