PHP function imagettftext() and unicode
Asked Answered
I

4

12

I'm using the PHP function imagettftext() to convert text into a GIF image. The text I am converting has Unicode characters including Japanese. Everything works fine on my local machine (Ubuntu 7.10), but on my webhost server, the Japanese characters are mangled. What could be causing the difference? Everything should be encoded as UTF-8.

Broken Image on webhost server: http://www.ibeni.net/flashcards/imagetest.php

Copy of correct image from my local machine: http://www.ibeni.net/flashcards/imagetest.php.gif

Copy of phpinfo() from my local machine: http://www.ibeni.net/flashcards/phpinfo.php.html

Copy of phpinfo() from my webhost server: http://example5.nfshost.com/phpinfo

Code:

mb_language('uni');
mb_internal_encoding('UTF-8');

header('Content-type: image/gif');

$text = '日本語';
$font = './Cyberbit.ttf';

// Create the image
$im = imagecreatetruecolor(160, 160);
$white = imagecolorallocate($im, 255, 255, 255);
$black = imagecolorallocate($im, 0, 0, 0);

// Create some colors
imagefilledrectangle($im, 0, 0, 159, 159, $white);

// Add the text
imagettftext($im, 12, 0, 20, 20, $black, $font, $text);
imagegif($im);
imagedestroy($im); 
Inherence answered 13/10, 2008 at 15:33 Comment(0)
I
13

Here's the solution that finally worked for me:

$text = "你好";
// Convert UTF-8 string to HTML entities
$text = mb_convert_encoding($text, 'HTML-ENTITIES',"UTF-8");
// Convert HTML entities into ISO-8859-1
$text = html_entity_decode($text,ENT_NOQUOTES, "ISO-8859-1");
// Convert characters > 127 into their hexidecimal equivalents
$out = "";
for($i = 0; $i < strlen($text); $i++) {
    $letter = $text[$i];
    $num = ord($letter);
    if($num>127) {
      $out .= "&#$num;";
    } else {
      $out .=  $letter;
    }
}

Converting the string to HTML entities works except that the function imagettftext() doesn't accept named entities. For example,

&#26085;&#26412;&#35486;

is OK, but

&ccedil;

is not. Converting back to ISO-8859-1, converts the named entities back to characters, but there is a second problem. imagettftext() doesn't support characters with a value greater than >127. The final for-loop encodes these characters in hexadecimal. This solution is working for me with the text that I am using (includes Japanese, Chinese and accented latin characters for Portuguese), but I'm not 100% sure it will work in all cases.

All of these gymnastics are needed because imagettftext() doesn't really accept UTF-8 strings on my server.

Inherence answered 14/10, 2008 at 16:2 Comment(4)
Why the UTF-8 > HTML entities > ISO-8859 conversion instead of simply UTF-8 > ISO-8859?Prepotency
+1 Just as deceze mentioned, I would probably go with iconv('UTF-8', 'ISO-8859-1', $text) instead of the entity approach, but other than that the conversion to hex representation is the way to go! Thx for the tip!Trawl
and UTF-8 to ISO-8859-1 has a function, its utf8_decodeHandcraft
@Prepotency because it’s not about the charset conversionThirddegree
F
13

I have been having the same problem with a script that will render text in an image and output it. Problem was, that due to different browsers (or code hardiness/paranoia, whichever way you want to think of it), I had no way of knowing what encoding was being put inside the $_GET array.

Here is how I solved the problem.

$item_text = $_GET['text'];

// Detect if the string was passed in as unicode
$text_encoding = mb_detect_encoding($item_text, 'UTF-8, ISO-8859-1');
// Make sure it's in unicode
if ($text_encoding !== 'UTF-8') {
    $item_text = mb_convert_encoding($item_text, 'UTF-8', $text_encoding);
}

// HTML numerically-escape everything (&#[dec];)
$item_text = mb_encode_numericentity($item_text,
    [0x0, 0xffff, 0, 0xffff], 'UTF-8');

This solves any problem with imagettftext not being able to handle characters above #127 by simply changing ALL the characters (including multibyte Unicode characters) into their HTML numeric character entity—"&#65;" for "A", "&#66;" for "B", etc.—which the manual page claims support for.

Flogging answered 24/12, 2009 at 2:23 Comment(2)
Worked for me too. I was trying to get the TM character to print. Only worked in certain fonts, though, even though all of the fonts I tried had the character in them.Turnage
mb_convert_encoding is deprecated as of PHP 8.2. php.watch/versions/8.2/utf8_encode-utf8_decode-deprecated. Possible suggested fix - github.com/Kristories/symfony/commit/…Drumm
V
4

I had the same problem. Converting font from otf to ttf helped. You can use FontForge (available in standard repository) to convert.

Valdavaldas answered 13/10, 2010 at 8:54 Comment(2)
This comment just saved me many hours of debugging. A tip if you don't want to use FontForge: http://www.freefontconverter.com/Brokenhearted
Clearly the best solution here !Liana
B
0

My prime suspect is the font you are using for rendering.

According to http://fr3.php.net/imagettftext, different versions of the GD library used by php can show different behaviour.

  • GD Version on your local machine: 2.0 or higher
  • GD Version on your webhost server: bundled (2.0.34 compatible)

Edit: Another idea: can you verify that $text = '日本語'; is really saved like this on your production server? Maybe there is an encoding problem with your script.

Next edit: BKB already proposed that. So in case this is the cause: he was first with the answer ;-)

Blight answered 13/10, 2008 at 15:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.