php sprintf() with foreign characters?
Asked Answered
F

4

27

Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary?

I want the following lines to be aligned correctly for a report:

2011-11-27   A1823    -Ref. Leif  -           12 873,00    18.98
2011-11-30   A1856    -Rättat xx -            6 594,00    19.18

I'm using sprintf() like this: %-12s %-8s -%-10s -%20s %8.2f

Using: php-5.3.23-nts-Win32-VC9-x86

Folkmoot answered 14/4, 2013 at 19:42 Comment(4)
This problem (that different characters consist of different numbers of bytes and different grapheme clusters consist of different numbers of characters) is somewhat similar to (but not the same as) #9167198. The bottom line is that it might be easiest to put the data in an HTML table instead.Alejandroalejo
Yeah this is definitely not a duplicate, this question is about multibyte characters is sprintf(), the other one is about font display widths.Myra
This was not a duplicate question at all... You can do the trick by doing : utf8_encode(sprintf('format', utf8_decode($yourstring));... Of course you'll have to check every arguments if many are given.Unyielding
This question is about characters with a unicode code point above 127, that when encoded with UTF-8 uses more than one byte. Unfortunately sprintf and printf don't handle that. When printing a 2 character string that uses 6 bytes when encoded with UTF-8, %8s prints the wrong number of spaces (8-6=2) instead of (8-2=6). This has NOTHING to do with the font used, like the question that this question is supposed to be duplicate of. This question is about phps' lack of support for multibyte characters.Waynant
S
14

I was actually trying to find out if PHP ^7 finally has a native mb_sprintf() but apparently no xD.

For the sake of completeness, here is a simple solution I've been using in some old projects. It just adds the diff between strlen & mb_strlen to the desired $targetLengh. The non-multibyte example is just added for the sake of easy comparison =).

$text = "Gultigkeitsprufung ist fehlgeschlagen: %{errors}";
$mbText = "Gültigkeitsprüfung ist fehlgeschlagen: %{errors}";
$mbTextRussian = "Проверка не удалась: %{errors}";

$targetLength = 60;
$mbTargetLength = strlen($mbText) - mb_strlen($mbText) + $targetLength;
$mbRussianTargetLength = strlen($mbTextRussian) - mb_strlen($mbTextRussian) + $targetLength;

printf("%{$targetLength}s\n", $text);
printf("%{$mbTargetLength}s\n", $mbText);
printf("%{$mbRussianTargetLength}s\n", $mbTextRussian);

result

            Gultigkeitsprufung ist fehlgeschlagen: %{errors}
            Gültigkeitsprüfung ist fehlgeschlagen: %{errors}
                              Проверка не удалась: %{errors}

update 2019-06-12


@flowtron made me give it another thought. A simple mb_sprintf() could look like this.

function mb_sprintf($format, ...$args) {
    $params = $args;

    $callback = function ($length) use (&$params) {
        $value = array_shift($params);
        return strlen($value) - mb_strlen($value) + $length[0];
    };

    $format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);

    return sprintf($format, ...$args);
}

echo mb_sprintf("%-10s %-10s %10s\n", 'thüs', 'wörks', 'ök');
echo mb_sprintf("%-10s %-10s %10s\n", 'this', 'works', 'ok');

result

thüs       wörks              ök
this       works              ok

I only did some happy path testing here, but it works for PHP >=5.6 and should be good enough to give ppl an idea on how to encapsulate the behavior. It does not work with the repetition/order modifiers though - e.g. %1$20s will be ignored/remain unchanged.

Spencerianism answered 30/4, 2019 at 19:2 Comment(2)
I had hoped to find something less hacky, because this is the way I've been doing it too - upvoted since the linked routine in @Martin Prikryl doesn't work (for me).Sexed
you made me give it another though =)Spencerianism
D
13

Strings in PHP are basically arrays of bytes (not characters). They cannot work natively with multibyte encodings (such as UTF-8).

For details see:
https://www.php.net/manual/en/language.types.string.php#language.types.string.details

Most string functions in PHP have multibyte equivalent though (with the mb_ prefix). But the sprintf does not.

There's a user comment (by "viktor at textalk dot com") with multibyte implementation of the sprintf on the function's documentation page at php.net. It may work for you:
https://www.php.net/manual/en/function.sprintf.php#89020

Duplication answered 14/4, 2013 at 20:27 Comment(1)
correct explanation, but the linked function does not work for me – even after doing the mb_* function name replacements mentioned in the remarks. I'd hoped for a better solution than @Spencerianism has provided, it's my current hacky solution too.Sexed
V
4

If you're using characters that fit in the ISO-8859-1 character set, you can convert the strings before formatting, and convert the result back to UTF8 when you are done

utf8_encode(sprintf("%-12s %-8s", utf8_decode($paramOne), utf8_decode($paramTwo))
Vallee answered 26/9, 2018 at 7:48 Comment(0)
O
0

Problem

There is no multibyte format functions.

Idea

You can't convert input strings. You should change format lengths. A format %4s means 4 widths (not characters - see footnote). But PHP format functions count bytes. So you should add format lengths to bytes - widths.

Implementations

from @nimmneun

function mb_sprintf($format, ...$args) {
    $params = $args;
    $callback = function ($length) use (&$params) {
        $value = array_shift($params);
        return $length[0] + strlen($value) - mb_strwidth($value);
    };
    $format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);
    return sprintf($format, ...$args);
}

And don't forget another option str_pad($input, $length, $pad_char=' ', STR_PAD_RIGHT)

function mb_str_pad(...$args) {
    $args[1] += strlen($args[0]) - mb_strwidth($args[0]);
    return str_pad(...$args);
}

Footnote

Asian characters have 3 bytes and 2 width and 1 character length. If your format is %4s and the input is one asian character, you should need two spaces (padding) not three.

Octo answered 17/4, 2021 at 15:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.