pdftk + xfdf + php can't handle umlauts
Asked Answered
I

3

3

I'm using XFDF files to fill out PDF-forms serverside with PHP and pdftk but my problem is that no non-english characters (ä, ö, å etc.) are printed to the form fields.

Here is the function I use to parse the XFDF file:

function createFDF($file,$info,$enc='UTF-8'){ 
$data='<?xml version="1.0" encoding="'.$enc.'"?>'."\n". 
    '<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">'."\n". 
    '<fields>'."\n"; 
foreach($info as $field => $val){ 
    $data.='<field name="'.$field.'">'."\n"; 
    if(is_array($val)){ 
        foreach($val as $opt) 
            $data.='<value>'.htmlentities($opt,ENT_COMPAT,$enc).'</value>'."\n"; 
    }else{ 
        $data.='<value>'.htmlentities($val,ENT_COMPAT,$enc).'</value>'."\n"; 
    } 
    $data.='</field>'."\n"; 
} 
$data.='</fields>'."\n". 
    '<ids original="'.md5($file).'" modified="'.time().'" />'."\n". 
    '<f href="'.$file.'" />'."\n". 
    '</xfdf>'."\n"; 
return $data; 

And the resulting XFDF file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<fields>
<field name="loadman-pudotuspainolaitteen-mittaustulosten-tallenne">
<value>1201</value>
</field>
<field name="tutkittavarakenne-rivi1">
<value>a</value>
</field>
<field name="tutkittavarakenne-rivi2">
<value></value>
</field>
<field name="tutk-pvm">
<value>11.12.2012</value>
</field>
<field name="mittauksen_suorittaja">
<value>o</value>
</field>
<field name="vast-tyonjohtaja">
<value>&ouml;</value>
</field>
<field name="rakennemateriaali">
<value>&auml;</value>
</field>
<field name="laatuvaatimukset">
<value>&aring;</value>
</field>
<field name="mittauspaikan_tiivistysmenetelma">
<value>&aacute;</value>
</field>
<field name="pohjalevy">
<value>&eacute;</value>
</field>
<field name="pohjamaa-alusrakenne">
<value>&iacute;</value>
</field>
<field name="mittauspaikan-tiivistysmenetelma">
<value>&egrave;</value>
</field>
<field name="emoduli">
<value>&ouml;</value>
</field>
<field name="tiiveys">
<value>&ouml;&auml;</value>
</field>
<field name="huomautukset_ja_loppupaatelmat1">
<value>&ouml;&auml;</value>
</field>
<field name="huomautukset_ja_loppupaatelmat2">
<value>&ouml;&auml;</value>
</field>
<field name="huomautukset_ja_loppupaatelmat3">
<value>&ouml;&auml;</value>
</field>
<field name="empa1">
<value>&ouml;</value>
</field>
<field name="empa1-e">
<value>&ouml;</value>
</field>
<field name="empa2">
<value>&ouml;</value>
</field>
<field name="empa2-e">
<value>&ouml;</value>
</field>
<field name="allekirjoitus">
<value>Einomies Porkkakoski</value>
</field>
</fields>
<ids original="84b0ff7a04b017303be186faa0d1254a" modified="1343290963" />
<f href="assets/loadman.pdf" />
</xfdf>

The fields with english letters print perfectly but letters with acutes, graves or scandinavian additions wont transfer to the PDF file. EXCEPT for some reason

<field name="huomautukset_ja_loppupaatelmat1">
<value>&ouml;&auml;</value>
</field>

works perfectly and prints öä!

The command I run is

pdftk <pdf-file> fill_form <xfdf-file> output <output file> flatten

This does not result any errors.

I'm using Debian 6.0, PHP 5.3.3-7+squeeze13 and the pdftk version is 1.44-5

UPDATE I noticed that if I don't flatten the generated file and open it, the characters are printed correctly when the field is activated but hidden again when the field is unfocused. If I manually type anything to the file, the special characters will show up also. Saved and reopened file however doesn't show the text unless again some text is added.

UPDATE 2 Got the damn thing fixed. Originally the forms were made with Adobe Acrobat Pro on OSX Snow Leopard. Now I remade the forms with LibreOffice + Oracle PDF Import plugin and everything seems to be working!

Isoniazid answered 26/7, 2012 at 8:34 Comment(1)
Having the same issue filling a PDF(created in LibreOffice Writer) with PDFTK. In the exported pdf i can not write a non-latin character,i exported with the built in pdf exporterEntablature
C
2

I think you will have more luck if you use the following list:

  • &#196; for Ä (instead of &Auml;)
  • &#197; for Å (instead of &Aring;)
  • &#214; for Ö (instead of &Ouml;)
  • &#220; for Ü (instead of &Uuml;)
  • &#223; for ß (instead of &szlig;)
  • &#228; for ä (instead of &auml;)
  • &#229; for å (instead of &aring;)
  • &#246; for ö (instead of &ouml;)
  • &#252; for ü (instead of &uuml;)

I'll let you yourself find out how to extend that list until it reaches completeness :-)

Cheloid answered 26/7, 2012 at 13:56 Comment(1)
Funnily enough there recently was a discussion on the Unicode mailing list coming to the conclusion that character references are largely unnecessary nowadays ;-). But indeed, HTML entities are definitely wrong for an XML format.Declivous
H
2

It's because you use htmlentities in your PHP script. That converts the accented symbols to &xxxx;

Set your XML encoding to iso-8859-1 or WINDOWS-1252 and leave out the htmlentities in your PHP script

Another thing to try is to use utf8_encode instead of htmlentities (and not modify the XML-encoding)

Herschel answered 26/7, 2012 at 14:18 Comment(2)
So I changed the encoding to ISO-8859-1 and removed the htmlentities. Now ä results <value>ä</value> in the XFDF file and the value is printed just simply ¤. What am I not understanding?Isoniazid
I used utf8_decode() to change <value>ä</value> to <value>ä</value> but now again the character won't print to the field.Isoniazid
C
0

To support any UFT-8 characters, I wrote PdfFormFillerUTF-8: http://sourceforge.net/projects/pdfformfiller2/

Cadmus answered 22/1, 2014 at 23:20 Comment(1)
Hi Nikolay, thanks for your answer and welcome to Stack Overflow. A small tip when writing answers: it's usually better to include an explanation of your answer rather than just a link. That way, if the link ever moves or breaks, the answer will still be useful.Quattlebaum

© 2022 - 2024 — McMap. All rights reserved.