wicked_pdf shows unknown character on unicode pdf conversion (ruby)
Asked Answered
T

1

12

I'm trying to create a pdf from a html page using wicked_pdf (version 1.1) and wkhtmltopdf-binary gems. My html page contains a calendar emoji that displays well in the browser whatever font I use

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta http-equiv='content-type' content='text/html; charset=utf-8' />
  <style>
  unicode {
     font-family: 'OpenSansEmoji', sans-serif;
  }
  @font-face {
     font-family: 'OpenSansEmoji';
     src: url(data:font/truetype;charset=utf-8;base64,<-- encoded_font_base64_string-->) format('truetype');
  }
 </style>
 </head>
 <body>
 <div><unicode>&#128197;</unicode></div>
 </body>
 </html>

However, when I try to generate the PDF using the WickedPdf.new.pdf_from_html_file method of the gem in the rails console,

 File.open(File.expand_path('~/<--pdf_filename-->.pdf'), 'wb+') {|f| f.write  WickedPdf.new.pdf_from_html_file('<--absolute_path_of_html_file-->')}  

I get the following result:

PDF result with unknown character

As you can see, the first calendar icon is properly displayed, however there is a second character that is displayed, we do not know where it's coming from.

I have investigated through encoding in UTF-8 and UTF-16 and surrogate pair as suggested by this related post stackoverflow_emoji_wkhtmltopdf and looked at this issue wkhtmltopdf_git_issue but still can't make this character disappear!

If you have any clue, it's more than welcome.

Thanks in advance for your help!

EDIT

Following the comments from Eric Duminil and petkov.np, I can confirm - the code above works for me properly on Linux. Seems like this is a Linux vs MacOS issue. Can anyone suggest what the core of the issue in MacOS binding and whether it can be fixed?

Toile answered 10/1, 2017 at 13:40 Comment(2)
It works just fine with your html and ruby code. gem list | grep pdf : pdf-core (0.6.1) pdf-inspector (1.2.1) pdf-reader (1.4.0) wicked_pdf (1.1.0) wkhtmltopdf-binary (0.12.3.1) ruby -v: ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux] On linux mint 17 – Retired
@EricDuminil I have tested on Linux environment and it works. Just edited my question – Toile
N
3

I've edited this answer several times, please see the notes at the end as well as the comments.

I'm using macOS 10.12.2 and have the same issue. I'm listing all the browser etc. versions, although I suspect the biggest factor is the OS/wkhtmltopdf build.

  • Chrome: Version 55.0.2883.95 (64-bit)
  • Safari: Version 10.0.2 (12602.3.12.0.1)
  • wkhtmltopdf: 0.12.3 (with patched qt)

I'm using the following example snippet:

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html" charset="utf-8">
    <style type="text/css">
      p {
        font-family: 'EmojiSymbols', sans-serif;
      }
      @font-face {
        font-family: 'EmojiSymbols';
        src: local('EmojiSymbols-Regular.woff'), url('EmojiSymbols-Regular.woff') format('woff');
      }

      span:before {
        content: '\01F60B';
      }
    </style>
  </head>
  <body>
    <p>
      πŸ˜‹
      <span></span>
      &#x1F60B;
      &#128523;
      &#xf0;&#x9f;&#x98;&#x8b;
    </p>
  </body>
</html>

I'm calling wkhtmltopdf with the --encoding 'UTF-8' option.

You can see the rendered result here (I'm sorry for the lame screenshot). Some brief conclusions:

  1. Safari doesn't render the 'raw' UTF-8 bytes properly. It seems to treat them just as the raw byte sequence (last line in the html paragraph). Safari renders everything fine.
  2. Chrome renders everything fine.
  3. With the above option, wkhtmltopdf renders the raw bytes (sort of) ok, but doesn't render the CSS content attribute properly. Every 'proper' occurrence of the unicode symbol is followed by this strange phantom symbol.

I've tried literally everything but the results are the same. For me, the fact that even Safari doesn't render the raw bytes properly indicates some system-level problem that is macOS specific. It's unclear to me wether this should be reported as a wkhtmltopdf issue or there is some misbehaved dependency in the macOS build.

EDIT: Safari seems to work fine, my markup was broken.

EDIT: A CSS workaround may do the trick, please check the comments below.

FINAL EDIT: As shown in the comments, the CSS 'hack' that solves the issues is using text-rendering: optimizeLegibility;. This seems to only be needed on macOS/OS X.

From my comment below:

I just found this issue. It seems irrelevant at first glance, but adding text-rendering: optimizeLegibility; to my styles removed the duplicate characters (on macOS). Why this happens is beyond me. As the issue author also uses osx, it's apparent there is some problem withwkhtmltopdf builds for this os.

Nickell answered 16/1, 2017 at 9:43 Comment(11)
petkov.np can you try putting a Base64 inline like the OP does? E.g.: <%= Base64.strict_encode64(Rails.application.assets['EmojiSymbols-Regular.woff'].source) %> – Simonize
I've tried base64-encoding the font as well, it doesn't make a difference in the rendered pdf. – Nickell
Just tried rendering with wkhtmltopdf 0.12.2.4 on Ubuntu 16.04 and the result is as expected. – Nickell
@Nickell I have just tested my code on Linux environment and it actually works. It might be a MacOS binding issue. Just added this remark to my question – Toile
I just found this issue. It seems irrelevant at first glance, but adding text-rendering: optimizeLegibility; to my styles removed the duplicate characters (on macOS). Why this happens is beyond me. As the issue author also uses osx, it's apparent there is some problem with wkhtmltopdf builds for this os. – Nickell
It solved it! The unknown character has disappeared. I will try to investigate why but thks anyway. – Toile
@Nickell It worked for us !! Can you edit your answer so I can give you the bounty ? I believe it's gonna help a lot of people in the same case. Also, if you have more informations on what's the fix, how you found it, it would be awesome ! Thanks ! – Jaella
Glad I was able to help. wkhtmltopdf issues tend to be really frustrating. – Nickell
Also, if you have more informations on what's the fix, how you found it and how it works, it would be awesome ! Thanks ! – Jaella
I don't have additional info really, just searched through all issues related to Unicode and OS X in the project's repo. I've gathered everything relevant here. – Nickell
the source issue in github, still couldn't find any actual solution for it. – Brazier

© 2022 - 2024 β€” McMap. All rights reserved.