Weird characters when filling PDF with PDFTk
Asked Answered
S

2

4

I'm using php with PDFTK on Ubuntu. When filling a PDF with data, I get weird characters for this letters with accents: á ó í. I'm using UTF-8 encoding: I checked with echo mb_check_encoding($var, 'UTF-8') which outputs 1 - TRUE. Any idea what I can do?

I also tried converting to ISO with utf8_decode, but still, no luck.

Thanks

Supersensitive answered 18/5, 2011 at 16:30 Comment(2)
see pdf reference 1.7 page 157 about Text Strings. Then you will know that UTF-8 is possibly wrong.Ratel
maybe help someone my solution w https://mcmap.net/q/1020377/-missing-characters-in-filled-pdf-using-pdftk-with-encoding-utf-8Claudette
S
4

Solved with utf8_decode. I guess there were some caching problems and the characters were still showing

Supersensitive answered 27/5, 2011 at 11:56 Comment(2)
I have same issue with characters like ØÆÅ. Can you explain how solve the issue.Nilson
Apply this function to the variable where you need it, for example: $fdf_data_strings = array( 'pdf_string' => utf8_decode($form_state['values']['test']); );Supersensitive
B
9

You're right, utf8_decode() will work for characters which can be encoded as Windows-1252 (i.e. U+0000–U+00FF).

However it won't work for characters which can't be encoded in Windows-1252.

You can always encode characters using UTF-16BE, though. You can do this for a single field only, e.g. to encode the word "özil":

<<
/V (þÿ^@ö^@z^@i^@l)
/T (name)
>>

(Here the "^@" indicates a NUL character (U+0000). This is how it looks in my editor (vim), if the file is encoded in Windows-1252 (latin1).)

Note that you need to use a byte order mark (which will appear as "þÿ" if your file is encoded in Windows-1252) and you'll need to encode the entire string (between the two parentheses) in UTF-16.

If you're generating the FDF in a PHP script you can do something like this:

<<
/V (<?php echo chr(0xfe) . chr(0xff) . str_replace(array('\\', '(', ')'), array('\\\\', '\(', '\)'), mb_convert_encoding("özil", 'UTF-16BE')); ?>)
/T (name)
>>

You can also write out the hex codes like this (i.e. enclosed in angular brackets rather than parentheses):

<<
/V <FEFF00F6007A0069006C>
/T (name)
>>

This has exactly the same result (the string "özil"). It's less efficient in terms of characters, but it actually seems to be more reliable in pdftk, which has some bugs I've found (in version 2.02).

Finally, you can also write out the Unicode code point for any character in octal notation (\ddd). For example, ö has codepoint U+00F6, which in octal is 366, so you can write:

<<
/V (\366zil)
/T (name)
>>

However, this only works up to U+00FF (octal 377). Beyond that, you'd have to use UTF-16.

The PDF standard allows you to set the encoding to UTF-8 for the whole FDF document. I tried this and it didn't work with pdftk, however in theory it would be done like this:

%FDF-1.2
1 0 obj
<<
/Version /1.3
/Encoding /utf_8
/FDF

(You would presumably have to set the FDF version to 1.3 (or more) in the header too, according to the standard.)

You can also do this at the field level:

<<
/V (özil)
/T (name)
/Encoding /utf_8
>>

But as I said, I didn't manage to get any of this to work. pdftk just seems to ignore it.

Burier answered 3/10, 2013 at 22:20 Comment(0)
S
4

Solved with utf8_decode. I guess there were some caching problems and the characters were still showing

Supersensitive answered 27/5, 2011 at 11:56 Comment(2)
I have same issue with characters like ØÆÅ. Can you explain how solve the issue.Nilson
Apply this function to the variable where you need it, for example: $fdf_data_strings = array( 'pdf_string' => utf8_decode($form_state['values']['test']); );Supersensitive

© 2022 - 2024 — McMap. All rights reserved.