dompdf character encoding UTF-8
Asked Answered
H

12

32

Im trying to create pdf with correct characters, but there are "?" chars. I created a test php file, where Im trying to fing the best solution. If Im open in the browser the html I looks like ok

UTF-8 --> UTF-8 : X Ponuka číslo € černý Češký 

But when I look into the pdf I see this

UTF-8 --> UTF-8 : X Ponuka ?íslo € ?erný ?ešký 

Here is my all code:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <title>č s š Š</title>
</head>
<body>
<?php 

require_once("dompdf/dompdf_config.inc.php");
$tab = array("UTF-8", "ASCII", "Windows-1250", "ISO-8859-2", "ISO-8859-1", "ISO-8859-6", "CP1256"); 
$chain = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <style></style><title>č s š Š</title></head><body>';
foreach ($tab as $i) 
    { 
        foreach ($tab as $j) 
        { 
            $chain .= "<br> $i --> $j : ".iconv($i, $j, 'X Ponuka číslo € černý Češký <br>'); 
        } 
    } 
$chain .= '<p style="font-family: firefly, verdana, sans-serif;">??????X Ponuka číslo € černý Češký <br></p></body></html>';
echo $chain; 
echo 'X Ponuka číslo € černý Češký <br>'; 

$filename = 'pdf/_1.pdf';
$dompdf = new DOMPDF();
$dompdf->load_html($chain, 'UTF-8');
$dompdf->set_paper('a4', 'portrait'); // change these if you need to
$dompdf->render();
file_put_contents($filename, $dompdf->output());

?> 
</body>
</html>

What Im doing wrong? I tried many many options which I found :( Any idea?

Houck answered 5/5, 2013 at 12:27 Comment(3)
Most libraries do not allow you to load data in a different encoding than the one you tell explicitly the library to load. This often results in the questions marks then. So I actually wonder why you really think that with DOMPDF this should be different? Also just trying through all options can be okay for playing around, but if that play does not give any results quick, you need to find a different strategy to understand what is going on.Squad
I made several options, because it was hard to find out how does it works, the charset ISO-8859-2 there is not any usable info about it, I googled lot, and I wanted UTF-8, where every char is ok!Houck
Yes, UTF-8 is a good choice if you want to support all (on computer systems) known characters. However in your code above, you do multiple encodings in the same string. That can never work out well. Instead it's better to find out which encoding your strings originally have. And then with the specific encoding convert into UTF-8. You should only do a single re-encoding here. This answer might be interesing for you as well: https://mcmap.net/q/453799/-dompdf-special-charactersSquad
P
55

You should read over the Unicode How-to again. The main problem is that you don't specify a font that supports your characters. It looks like you've read the how-to, because you're using the font example from that document. However the example was not meant to apply globally to any document, dompdf doesn't include firefly (a Chinese character font) or Verdana by default.

If you do not specify a font then dompdf falls back to one of the core fonts (Helvetica, Times Roman, Courier) which only support Windows ANSI encoding. So always be sure to style your text with a font that supports Unicode encoding and has the characters you need to display.

With dompdf 0.6.0 you can use the included Deja Vu fonts. So the following should work (just the HTML):

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<style>
  body { font-family: DejaVu Sans, sans-serif; }
</style>
<title>č s š Š</title>
</head>
<body>
  <p>??????X Ponuka číslo € černý Češký <br></p>
</body>
</html>
Pochard answered 6/5, 2013 at 2:53 Comment(8)
What version of dompdf? The DejaVu fonts were only included starting with 0.6.x. Also, multiple things can affect the output. E.g., your document should actually be encoded as UTF-8 as well as specifying that encoding in the header.Pochard
version was 0.6.1 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>and font was set withing css and body tag: font-family: Helvetica,"Times New Roman", serif;Thunell
@andreas-manusm you'll need to use the DejaVu fonts if you use the character directly. The built-in fonts should be able to display the character if you encode it as &#0128; (the Windows ANSI character position).Pochard
Meanwhile I fixed this by writing "Euro", i was using '&#8364;' beforeThunell
Thanks for pointing me to use the DejaVu fonts - this time I had a precise template to fullfill. Best practice for next project is creating a template/design based on DejaVu font.Thunell
This is working fine in latest dompdf (v0.7.0-beta2) downloaded from github.com/dompdf/dompdf/tags .Coattail
I was searching 3 days for a solution, before i found this and now it works perfectly. THANK YOU SO MUCH !!!Dermato
@Pochard How can I add my own font?Sewerage
O
44

I got UTF-8 characters working with this combination. Before you pass html to DOMpdf, make encoding covert with this:

$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');

Use DejaVu font in your css

*{ font-family: DejaVu Sans; font-size: 12px;}

Make sure you have set utf-8 encoding in HTML <head> tag

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Now all special characters are working "ľ š č ť ž ý á í é"

Overabundance answered 2/2, 2015 at 10:17 Comment(1)
For me specifying <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> workedNimitz
U
27

Only Add

  <style>
    *{ font-family: DejaVu Sans !important;}
  </style>

before </head> It is working for me.

Ungrudging answered 17/1, 2017 at 16:42 Comment(1)
Also SET def("DOMPDF_ENABLE_HTML5PARSER", false); to def("DOMPDF_ENABLE_HTML5PARSER", true); in dompdf_config.inc.php file.Ungrudging
N
3

Dompdf does not support fallback fonts, so you can't use your favorite font if it does not support your characters, and you also can't set another font to be the fallback font for those characters like droid sans fallback.

What you can do instead is take advantage of regex unicode script ranges: https://www.regular-expressions.info/unicode.html to wrap those blocks of text into spans and give them the fallback font.

Example:

$body = 'test 简化字 彝語/彝语 test číslo € černý Češký';

$cjk_scripts = 'Bopomofo|Han|Hiragana|Katakana';
$cjk_scripts = preg_replace('/[a-zA-Z_]+/', '\\p{$0}', $cjk_scripts);

// wrap the CJK characters into a span with it's own font
$body = preg_replace("/($cjk_scripts)+/isu", '<span class="cjk">$0</span>', $body);

// a font that supports CJK characters
$cjk_font_path = APP_PATH.'/fonts/DroidSansFallbackFull.ttf';

$html = <<<HTML
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<style type="text/css">
@font-face {
    font-family: 'DroidSansFallbackFull';
    font-style: normal;
    font-weight: 400;
    src: url('$cjk_font_path') format('truetype');
}
body {
    font-family: DejaVu Sans, sans-serif;;
}
.cjk {
    font-family: DroidSansFallbackFull, sans-serif;
}
</style>
</head>
<body>$body</body>
</html>
HTML;

$dompdf = new \DOMPDF();
$dompdf->set_paper('A4');
$dompdf->load_html($html);
$dompdf->render();

$dompdf->stream('test.pdf', ['Attachment'=>0]);

Related: https://github.com/dompdf/dompdf/issues/1508

Nagual answered 27/12, 2018 at 15:34 Comment(0)
W
2

utf8_decode() did the trick for me with some German translations like ä and ü.

echo utf8_decode('X Ponuka číslo € černý Češký <br>');
Wu answered 16/4, 2018 at 16:43 Comment(0)
R
2

Chinese characters are causing problems sometimes. The important part is to have good font here is a list you can download.

I chose first named "Kai Bold Font" here is a download page

Then put it on your hosting service in a public folder. I put it into

http://192.168.10.10/fonts/pdf/wts11.ttf

and here is my html example

$html = <<<EOT
<!DOCTYPE html>
<html>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
   <style>
    @font-face {
      font-family: chinese;
        src: url('http://192.168.10.10/fonts/pdf/wts11.ttf') format('truetype');
    }
    .chineseLanguage { font-family: chinese; }
      body {font-family: DejaVu Sans, sans-serif;}
   </style>
</head>
<body>
    Chinese
    <div class='chineseLanguage'>
        忠烈祠
        中文 - 这工作<br> 
    </div>
    hello world <br> 
    Russian - русский текст <br>
    Greek - α,β,γ,δ,ε <br>
    chars - !@#$%^&* -=- €   <br><br>
    <br>
    Hebrew (iw)<br><br>
    דג סקרן שט בים מאוכזב ולפתע מצא לו חברה איך הקליטה<br>
    <br>    
</body>
</html>
EOT;

PS. there is a little chance you might need this set:

ini_set("allow_url_fopen", true);
Rifkin answered 27/4, 2018 at 5:18 Comment(0)
E
1

Nothing out of mentioned answers helped me. After hours of struggle I switched to niklasravnsborg/laravel-pdf has nearly exactly the same syntax and usage, and everything is working allright.

Ethics answered 2/3, 2017 at 0:19 Comment(0)
P
1

If you don't mind having only one charset you can change every charset in dompdf_font_family_cache.dist.php

just like

<?php
$distFontDir = $rootDir . DIRECTORY_SEPARATOR . 'lib' . DIRECTORY_SEPARATOR . 'fonts' . DIRECTORY_SEPARATOR;
return array(
    'sans-serif' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'times' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'times-roman' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'courier' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'helvetica' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'zapfdingbats' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'symbol' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'serif' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'monospace' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'fixed' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'dejavu sans' =>
    array(
        'bold' => $distFontDir . 'DejaVuSans-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSans-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSans-Oblique',
        'normal' => $distFontDir . 'DejaVuSans'
    ),
    'dejavu sans mono' =>
    array(
        'bold' => $distFontDir . 'DejaVuSansMono-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSansMono-BoldOblique',
        'italic' => $distFontDir . 'DejaVuSansMono-Oblique',
        'normal' => $distFontDir . 'DejaVuSansMono'
    ),
    'dejavu serif' =>
    array(
        'bold' => $distFontDir . 'DejaVuSerif-Bold',
        'bold_italic' => $distFontDir . 'DejaVuSerif-BoldItalic',
        'italic' => $distFontDir . 'DejaVuSerif-Italic',
        'normal' => $distFontDir . 'DejaVuSerif'
    )
)
?>

I know it's not the best way, but it saves lot of time

Ponton answered 19/3, 2017 at 12:22 Comment(1)
That was my problem, I didn't have properly set the $rootDir! The fonts were not read.Eward
S
0

I had similar problem and ended up using tcpdf.Hope this could be helpful. http://www.tcpdf.org/
Problem was the font i was using.I was able to get the correct output using this font 'freeserif'.I guess it might be possible to get the same output using this font with dompdf.

$pdf->SetFont('freeserif', '', 12);

Here is the sample i have used. tcpdf utf-8 sample

<?php
header('Content-type: text/html; charset=UTF-8') ;//chrome
require_once('tcpdf_include.php');

// create new PDF document
$pdf = new TCPDF(PDF_PAGE_ORIENTATION, PDF_UNIT, PDF_PAGE_FORMAT, true, 'UTF-8', false);

$pdf->setFontSubsetting(true);

$pdf->SetFont('freeserif', '', 12);

$pdf->AddPage();

$utf8text = '
<html><head>  
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body>
<b>Ponuka číslo € černý Češký </b></br>
සිංහල  </br>
<u>தேமல </u> </br>
</body></html>';

$pdf->SetTextColor(0, 63, 127);

$pdf->writeHTML($utf8text, true, 0, true, true);

$pdf->Output('example_008.pdf', 'I');

?>
Scheck answered 11/9, 2013 at 12:6 Comment(0)
F
0

I had the same problem and I solved it very simple. Just import google fonts with required language subset in your CSS file which is used when generating HTML. Specify utf-8 in your HTML file and it's working...

@import url('https://fonts.googleapis.com/css?family=Roboto:400,700&subset=latin-ext');
body {font-family: 'Roboto', sans-serif;}
Formenti answered 11/7, 2018 at 12:27 Comment(0)
R
0

Lots of answers here, struggled to get any to provide cross-language support reliably. I believe that for those of us making distributed software, there is also server-setting blocks which stop some functionality such as @import and src:url() in pdfdom automatically working to embed a font.

The following solution has worked across many servers & locally hosted sites, and requires no command line access:

  1. Retrieve font you want to use as a .ttf (for language support including Cyrillic, Greek, Devanagari, Latin, and Vietnamese, we used Noto Sans with all optional languages checked)
  2. Run/build-in the following script and fire PDFBuilder_install_font_family() ONCE only (singular install)

Gist for PDFBuilder_install_font_family(): https://gist.github.com/woodyhayday/f8dc36cc7ec922bc1894f33eb2b0e928

Rhetorician answered 16/1, 2020 at 11:7 Comment(0)
L
0

You can use option:

use Dompdf\Dompdf;
use Dompdf\Options;

    $options = new Options();
    $options->set('isHtml5ParserEnabled', true);
    $options->set('isPhpEnabled', true);
    $options->set('isPhpEnabled', true);
    $options->set('isHtml5ParserEnabled', true);
    $options->set('isPhpEnabled', true);


    $dompdf = new Dompdf($options);
    $dompdf->set_option('isHtml5ParserEnabled', true);
    $dompdf->set_option('isPhpEnabled', true);
    $dompdf->set_option('isHtml5ParserEnabled', true);
    $dompdf->set_option('isPhpEnabled', true);

    $dompdf->loadHtml($message);
    $dompdf->setHttpContext('utf8');
Lafave answered 5/12, 2023 at 7:35 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.