Flatten FDF / XFDF forms to PDF in PHP with utf-8 characters
Asked Answered
D

13

12

My scenario:

  • A PDF template with formfields: template.pdf
  • An XFDF file that contains the data to be filled in: fieldData.xfdf

Now I need to have these to files combined & flattened. pdftk does the job easily within php:

exec("pdftk template.pdf fill_form fieldData.xfdf output flatFile.pdf flatten");

Unfortunately this does not work with full utf-8 support. For example: Cyrillic and greek letters get scrambled. I used Arial for this, with an unicode character set.

  • How can I accomplish to flatten my unicode files?
  • Is there any other pdf tool that offers unicode support?
  • Does pdftk have an unicode switch that I am missing?

EDIT 1: As this question has not been solved for more then 9 month, I decided to start a bounty for it. In case there are options to sponsor a feature or a bugfix in pdftk, I'd be glad to donate.

EDIT 2: I am not working on this project anymore, so I cannot verify new answers. If anyone has a similar problem, I am glad if they can respond in my favour.

Diena answered 19/10, 2010 at 22:25 Comment(2)
have you tried using the iText library directly for performing this function?Fellowship
Take a look on #6048470 it's solved my issueHenke
L
3

I found by using Jon's template but using the DomDocument the numeric encoding was handled for me and worked well. My slight variation is below:

$xml = new DOMDocument( '1.0', 'UTF-8' );

$rootNode = $xml->createElement( 'xfdf' );
$rootNode->setAttribute( 'xmlns', 'http://ns.adobe.com/xfdf/' );
$rootNode->setAttribute( 'xml:space', 'preserve' );
$xml->appendChild( $rootNode );

$fieldsNode = $xml->createElement( 'fields' );
$rootNode->appendChild( $fieldsNode );

foreach ( $fields as $field => $value )
{
    $fieldNode = $xml->createElement( 'field' );
    $fieldNode->setAttribute( 'name', $field );
    $fieldsNode->appendChild( $fieldNode );

    $valueNode = $xml->createElement( 'value' );
    $valueNode->appendChild( $xml->createTextNode( $value ) );
    $fieldNode->appendChild( $valueNode );
}

$xml->save( $file );
Lynnette answered 23/1, 2013 at 16:20 Comment(0)
C
1

You could try the trial version of http://www.adobe.com/products/livecycle/designer/ and see what PDF files it generates.

Another commercial software you could try is http://www.appligent.com/fdfmerge. See page 16 in http://146.145.110.1/docs/userguide/FDFMergeUserGuide.pdf for how it handles xFDF with UTF-8.

I also had a look at the FDF specification http://partners.adobe.com/public/developer/en/xml/xfdf_2.0.pdf On page 12 it states:

Although XFDF is encoded in UTF-8, double byte characters are encoded as character references when 
exported from Acrobat. 
For example, the Japanese double byte characters ,  , and  are exported to XFDF using 
three character references. Here is an example of double byte characters in a form field: 
  ...
<fields>  
  <field name="Text1"> 
     <value>Here are 3 UTF-8 double byte  
        characters: &#x3042;&#x3044;&#x3046;
</value>  
  </field>  
</fields> ... 

I looked through pdftk-1.44-dist/java/com/lowagie/text/pdf/XfdfReader.java. It doesn't seem to do anything special with the input.

Maybe pdftk will do what you want, when you encode the weird characters as character references in your xFDF input.

Collings answered 3/8, 2011 at 13:4 Comment(3)
Thanks, I will try that character reference thing later.Diena
Unfortunately, my installation of the codebase this is from got corrupt. This just means some more days of delay.Diena
Unfortunately not - but I admit I didn't spend much time on it... so I will give it another try.Diena
S
1

Using the pdftk 1.44 on a Win7 machine I encounter the same problems with xfdf-files whereas fdf works fine. I made a xfdf-file without any special characters (only ANSI) but pdftk crashed again. I mailed the developper. Unfortunately no answer until now.

Seep answered 12/8, 2011 at 18:33 Comment(0)
M
1

Unfortunately, UTF-8 character encoding does not work neither with decimal nor hexadecimal references of non-ASCII characters in source .xfdf file. PDFTK v. 1.44.

Margaritamargarite answered 17/2, 2012 at 14:34 Comment(1)
Take a look on #6048470Henke
Q
1

I made some progress on this. Starting with code from http://koivi.com/fill-pdf-form-fields/, I modified the value encoding to output numeric codes for any characters outside the ascii range.

Now with pitulski's special strings:

Poznań Śródmieście Ćwiartka Ósma outputs Pozna ródmiecie wiartka Ósma with some box shapes superimposed

ęóąśłżźćńĘÓĄŚŁŻŹĆŃ outputs óÓ with more box shapes. I think it may be that the box shapes are characters my server doesn't recognize.

I tried it with some French characters: ùûüÿ€’“”«»àâæçéèêëïôœÙÛÜŸÀÂÆÇÉÈÊËÏÎÔ and they all came out OK, but some of them were overlapping.

--edit-- I just tried entering these manually into the form and got the same result minus the box shapes (using Evince). I then tried with a different form (created by someone else) - after entering ęóąśłżźćńĘÓĄŚŁŻŹĆŃ, ółÓŁ was displayed. It looks like it depends which characters are included in the document's embedded fonts.

/*
KOIVI HTML Form to FDF Parser for PHP (C) 2004 Justin Koivisto
Version 1.2.?
Last Modified: 2013/01/17 - Jon Hulka(jon dot hulka at gmail dot com)
  - changed character encoding, all non-ascii characters get encoded as numeric character references

    This library is free software; you can redistribute it and/or modify it
    under the terms of the GNU Lesser General Public License as published by
    the Free Software Foundation; either version 2.1 of the License, or (at
    your option) any later version.

    This library is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
    or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
    License for more details.

    You should have received a copy of the GNU Lesser General Public License
    along with this library; if not, write to the Free Software Foundation,
    Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 

    Full license agreement notice can be found in the LICENSE file contained
    within this distribution package.

    Justin Koivisto
    justin dot koivisto at gmail dot com
    http://koivi.com
*/

/**
 * createXFDF
 * 
 * Tales values passed via associative array and generates XFDF file format
 * with that data for the pdf address sullpiled.
 * 
 * @param string $file The pdf file - url or file path accepted
 * @param array $info data to use in key/value pairs no more than 2 dimensions
 * @param string $enc default UTF-8, match server output: default_charset in php.ini
 * @return string The XFDF data for acrobat reader to use in the pdf form file
 */
function createXFDF($file,$info,$enc='UTF-8'){
    $data=
'<?xml version="1.0" encoding="'.$enc.'"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
    <fields>';
    foreach($info as $field => $val){
        $data.='
        <field name="'.$field.'">';
        if(is_array($val)){
            foreach($val as $opt)
//2013.01.17 - Jon Hulka - all non-ascii characters get character references
            $data.='
            <value>'.mb_encode_numericentity(htmlspecialchars($opt),array(0x0080, 0xffff, 0, 0xffff), 'UTF-8').'</value>';
//                $data.='<value>'.htmlentities($opt,ENT_COMPAT,$enc).'</value>'."\n";
        }else{
            $data.='
            <value>'.mb_encode_numericentity(htmlspecialchars($val),array(0x0080, 0xffff, 0, 0xffff), 'UTF-8').'</value>';
//            $data.='<value>'.htmlentities($val,ENT_COMPAT,$enc).'</value>'."\n";
        }
        $data.='
        </field>';
    }
    $data.='
    </fields>
    <ids original="'.md5($file).'" modified="'.time().'" />
    <f href="'.$file.'" />
</xfdf>';
    return $data;
}
Qintar answered 17/1, 2013 at 9:42 Comment(0)
T
1

While pdftk doesn't appear to support UTF-8 in the FDF file, I found that with

iconv -f utf-8 -t ISO_8859-1

in the pipeline converting that FDF file to ISO-Latin-1, then at least those characters that are in the Latin-1 code page will still be represented properly.

Treponema answered 27/4, 2020 at 17:6 Comment(0)
P
1

I have been solving this issue for a long time, and finally I have found the solution!

so, let's start.

  1. download and install the latest version of pdftk
# PDFTK
RUN apk add openjdk8 \
    && cd /tmp \
    && wget https://gitlab.com/pdftk-java/pdftk/-/jobs/1507074845/artifacts/raw/build/libs/pdftk-all.jar \
    && mv pdftk-all.jar pdftk.jar \
    && echo '#!/usr/bin/env bash' > pdftk \
    && echo 'java -jar "$0.jar" "$@"' >> pdftk \
    && chmod 775 pdftk* \
    && mv pdftk* /usr/local/bin \
    && pdftk -version
  1. Open your PDF Form in Adobe Acrobat Reader and look at field options, you need to detect the font, for example Helvetica, download this font.
  2. Fill the form with flatten option
/usr/local/bin/pdftk A=form.pdf fill_form xfdf.xml output out.pdf drop_xfa need_appearances flatten replacement_font /path/to/font/HelveticaRegular.ttf

xfdf.xml example:

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
    <fields>
        <field name="Check Box 136">
            <value>Your value | Значение (Cyrillic)</value>
        </field>
    </fields>
</xfdf>

Enjoy :)

Pyrrolidine answered 1/9, 2021 at 15:32 Comment(0)
M
0

What PDFTK's version? I tried the same thing with Polish characters (utf-8).

Does not work for me.

pdftk.exe, libiconv2.dll from: http://www.pdflabs.com/docs/install-pdftk/

Windows 7, cmd, file.pdf + file.fdf -> new.pdf

pdftk file.pdf fill_form file.xfdf output new.pdf flatten

Unhandled Java Exception:
java.lang.NoClassDefFoundError: gnu.gcj.convert.Input_UTF8 not found in [file:.\, core:/]
   at 0x005a3abe (Unknown Source)
   at 0x005a3fb2 (Unknown Source)
   at 0x006119f4 (Unknown Source)
   at 0x00649ee4 (Unknown Source)
   at 0x005b4c44 (Unknown Source)
   at 0x005470a9 (Unknown Source)
   at 0x00549c52 (Unknown Source)
   at 0x0059d348 (Unknown Source)
   at 0x007323c9 (Unknown Source)
   at 0x0054715a (Unknown Source)
   at 0x00562349 (Unknown Source)

But, with FDF file, with the same content, it worked properly. But the characters in new.PDF are bad.

pdftk file.pdf fill_form file.fdf output new.pdf flatten

---FDF---

%FDF-1.2
%âãÏÓ
1 0 obj<</FDF<</F(file.pdf)
/Fields[
<</T(Miejsce)/V(666 Poznań Śródmieście Ćwiartka Ósma)>>
<</T(Nr)/V(ęóąśłżźćńĘÓĄŚŁŻŹĆŃ)>>
]>>>>
endobj
trailer
<</Root 1 0 R>>
%%EOF

---XFDF---

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="file.pdf"/>
<fields>
<field name="Miejsce">
<value>666 Poznań Śródmieście Ćwiartka Ósma</value>
</field>
<field name="Nr">
<value>ęóąśłżźćńĘÓĄŚŁŻŹĆŃ</value>
</field>
</fields>
</xfdf>

---PDF---

Miejsce: 666 PoznaÅ— ÅıródmieÅłcie ăwiartka Ãfisma
Nr: ÄŽÃ³Ä–ÅłÅ‡Å¼ÅºÄ⁄Å—ÄŸÃfiÄ—ÅıņŻŹăÅ
Missus answered 17/11, 2010 at 16:46 Comment(2)
That's more or the less the same scenario. I tried with version 1.41-3 and with 1.43. As I see, 1.44 is out since Oct. 28, 2010. I will give it a try.Diena
I am getting the same exception as above uwith fdf too.Urfa
S
0

You can introduce utf-8 characters by giving their unicode code in octal with \ddd

Saul answered 27/11, 2013 at 11:50 Comment(0)
C
0

To solve this, I wrote PdfFormFillerUTF-8: http://sourceforge.net/projects/pdfformfiller2/

Carty answered 22/1, 2014 at 22:49 Comment(1)
Links are not answers. Answers on SO are expected to be self-contained. Please review this meta question and add enough detail to your question to make it not completely reliant on an external link. Perhaps you should add a code sample to show how this library will solve the problem?Statfarad
S
0

There is a drop-in replacement for pdftk tool

Mcpdf: https://github.com/m-click/mcpdf

that solves unicode issues when filling forms. Works for me with CP1250 characters (Central Europe).

From project page:

the following command fills in form data from DATA.xfdf into FORM.pdf and writes the result to RESULT.pdf. It also flattens the document to prevent further editing:

java -jar mcpdf.jar FORM.pdf fill_form - output - flatten < DATA.xfdf > RESULT.pdf

This corresponds exactly to the usual PDFtk command:

pdftk FORM.pdf fill_form - output - flatten < DATA.xfdf > RESULT.pdf

Note that you need to have JRE installed.

Siusan answered 15/4, 2015 at 17:7 Comment(1)
The gitrepo does not provide sample pdf files. For my pdf files it doesn't work even for their use-case of word "Łódź" (some chars are missing. Different chars in different pdf forms I tried. Yes, I also tried to generate form from LibreOffice). And it doesn't work for Russian chars at all in my tests, as well as in others. mcpdf probably works for the author's pipeline, apart from that alas it seems to be broken. Although the idea to wrap around iText is sound.Thereafter
C
0

I have managed to make it work with pdftk by creating a xfdf file with utf-8 encoding.

it took several tried but what make it work as exepcted was to add 'need_appearances'

here is an example:

pdftk source.pdf fill_form data.xfdf output output.pdf need_appearances
Canzona answered 26/8, 2018 at 5:7 Comment(0)
B
-2

pdftk supports encoding in UTF-16BE. It's not that difficult to convert from UTF-8 to UTF-16BE.

See: Weird characters when filling PDF with PDFTk

Boabdil answered 4/10, 2013 at 8:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.