Prawn::Errors::IncompatibleStringEncoding: Your document includes text that's not compatible with the Windows-1252 character set
Asked Answered
N

3

25

Below is my Prawn PDF file to generate a name on the PDF -

def initialize(opportunity_application)
  pdf = Prawn::Document.new(:page_size => [1536, 2048], :page_layout => :landscape)
  cell_1 = pdf.make_cell(content: "Eylül Çamcı".force_encoding('iso-8859-1').encode('utf-8'), borders: [], size: 66, :text_color => "000000", padding: [0,0,0,700], font: "app/assets/fonts/opensans.ttf")

  t = pdf.make_table [[cell_1]]
  t.draw
  pdf.render_file "tmp/mos_certificates/application_test.pdf"
end

When rendering the name Eylül Çamcı which is Turkish, I get the following error -

Prawn::Errors::IncompatibleStringEncoding: Your document includes text that's not compatible with the Windows-1252 character set.
If you need full UTF-8 support, use TTF fonts instead of PDF's built-in fonts.

I'm already using a TTF font that supports the characters in that name, what can I do to print the name correctly?

Naima answered 7/9, 2017 at 4:25 Comment(3)
are you following this instructions #37287476Basque
I tried this as well, and it spouted the same error. Here is the gist of what I tried - gist.github.com/mikevic/e1617641704aed9d8642b54fb5ea0351Naima
aren't you missing font "Opensans". I checked your gist, in the following post they first updated the font family and create a new one for "Arial" => { :normal => "/assets/fonts/Arial.ttf", :italic => "/assets/fonts/Arial Italic.ttf", } then they tell Prawnpdf to use that font family with font "Arial"Basque
H
12

It seams Turkish is missing in iso-8859-1.

On the other hand iso-8859-9 should work.

So you may try to change your code like (check the iso number that I changed):

...
cell_1 = pdf.make_cell(content: "Eylül Çamcı".force_encoding('iso-8859-9').encode('utf-8'), borders: [], size: 66, :text_color => "000000", padding: [0,0,0,700], font: "app/assets/fonts/opensans.ttf")
...

And a fun link which is not only related with character set but also other internalisation differences for Turkey.


Edit 1: I made a basic check, it seems the text is already in UTF-8. So why need to change to iso-8859 and come back to UTF-8?

Can you please try "Eylül Çamcı".force_encoding('utf-8') alone?

irb(main):013:0> "Eylül Çamcı".encoding
=> #<Encoding:UTF-8>
irb(main):014:0> "Eylül Çamcı".force_encoding('UTF-8')
=> "Eylül Çamcı"
irb(main):015:0>

Edit 2: Also can you check your font path? Both font exists and the path is proper?

#Rails.root.join('app/assets/fonts/opensans.ttf')
cell_1 = pdf.make_cell(content: "Eylül Çamcı".force_encoding('utf-8'), borders: [], size: 66, :text_color => "000000", padding: [0,0,0,700], font: Rails.root.join('app/assets/fonts/opensans.ttf'))
Haemocyte answered 11/9, 2017 at 6:25 Comment(3)
Sorry, but I'm still getting the same error when I try to change the ISO code as well :/Naima
Sorry.. Seems I misled you. Please check my edited answer. I may not solve but may give us a clue.Haemocyte
One more edit. If it does not solve either, I am sorry friend. The rest of the code seems OK to me.Haemocyte
C
5

I'm not sure I remember how Prawn works, but PDF files don't support UTF-8, which is the default Ruby encoding for String objects.

In fact, PDF files only support ASCII encoding using internal fonts - any other encoding requires that you bring your own font (which is also recommended for portability).

The workaround is to either use character maps (CMaps) - either custom CMaps or pre-defined ones (BYO font).

Generally, PDF files include an embedded font (or a subset of a font), and a CMap, mapping the value of a byte (or, a number of bytes) to a desired font glyph. i.e. mapping 97, which is 'a' in ASCII, to the å glyph when using the specified font.

Last time I used Prawn, I think it supported TTF fonts and created font maps automatically using UTF-8 Strings for the text input - but you have to load an appropriate font into Prawn and remember to use it!.

You can see an example in this answer.

Good Luck!

EDIT

I updated the answer to reflect @mkl's comments.

@mkl pointed out that other encodings are supported or possible (BYO font), including predefined some multibyte encoding (which use pre-defined CMaps).

Carnauba answered 15/9, 2017 at 19:38 Comment(4)
"In fact, PDF files only support ASCII encoding." - this simply is wrong. There is a wide palette of possible encodings for fonts in PDFs, both single byte and multi byte. Merely UTF-8 happens not to be among them.Kudos
@Kudos - I'm think you're mistaken. Multi-byte encodings aren't possible in the PDF format and any encoding other than ASCII (with a limited number of built in fonts) requires that you bring your own font and map the glyphs. You might be thinking of the authoring tool rather than the file format.Carnauba
"requires that you bring your own font" - but what is the problem about that? Embedding fonts actually is a necessity if you want PDFs to be really portable. That been said, though, even if only considering the standard 14 fonts there is much more than merely ASCII, please have a look at Annex D of the PDF specification ISO 32000-1 (part 2 has been released this year but I could not compare yet). And beyond those standard 14 fonts, PDF supports many predefined multi-byte encodings (cf. e.g. section 9.7.5 in ISO 32000-1) and an option to built your own encodings.Kudos
@Kudos - I updated my answer to reflect your comments. Let me know if you have further input.Carnauba
B
2

From this anwser about Force strings to UTF-8 from any encoding :

"Forcing" an encoding is easy, however it won't convert the characters just change the encoding:

str = str.force_encoding("UTF-8")
str.encoding.name # => 'UTF-8'

If you want to perform a conversion, use encode

Indeed, as @MehmetKaplan said:

It seams Turkish is missing in iso-8859-1.

On the other hand iso-8859-9 should work.

Therefore, you won't need the force_encodinganymore but just encode

[37] pry(main)> "Eylül Çamcı".encode('iso-8859-1')
Encoding::UndefinedConversionError: U+0131 from UTF-8 to ISO-8859-1
from (pry):39:in `encode'
[38] pry(main)> "Eylül Çamcı".encode('iso-8859-9')
=> "Eyl\xFCl \xC7amc\xFD"

This mean you have to drop the UTF-8 entirely in your code.

content: "Eylül Çamcı".encode('iso-8859-9'),
Ballance answered 12/9, 2017 at 11:13 Comment(1)
I'm still getting the same error :/ Do you think it has something to do with the fonts? I have checked on Google Fonts and Opensans supports the string I am trying with.Naima

© 2022 - 2024 — McMap. All rights reserved.