My text is written in a bad direction when I use a template
Asked Answered
C

2

5

I want to add a text on an existing PDF using Rails, so I did :

filename = "#{Rails.root}/app/assets/images/sample.pdf"
Prawn::Document.generate("#{Rails.root}/app/assets/images/full_template.pdf", :template => filename) do
  text "Test", :align => :center
end

And when I open full_template.pdf, I have my template PDF + my text "Test", but this text is written in a bad direction as if my text was written using a mirror.

You can find the two PDF documents here:

Original : http://www.sebfie.com/wp-content/uploads/sample.pdf

Generated : http://www.sebfie.com/wp-content/uploads/full_template.pdf

Cockcroft answered 22/8, 2012 at 15:17 Comment(4)
Seems like you sample.pdf is somehow corrupt.Beefeater
I tried with other pdf, same result.Cockcroft
Try this one: samplepdf.com/sample.pdfBeefeater
@Stefan: The sample.pdf isn't "corrupt" at all. It's perfectly "legal" PDF source code. It's just ..."weird" in the way that code is written. See the answers below.Bathy
B
9

Let's see... [switching into PDF debugging mode].

First, I unpack your full_template.pdf with the help of qpdf, a command-line utility "that does structural, content-preserving transformations on PDF files" (self-description):

qpdf --qdf full_template.pdf qdf---test.pdf

The result, qdf---test.pdf is now more easy to analyse in a normal text editor, because all streams are unpacked.

Searching for the string "est" finds us this line:

[(T) 120 (est)] TJ

Poking around a bit more (and looking at qpdf's very helpful comments sprinkled into its output!) we find this: the PDF object where your mirrored string "Test" appears in the original PDF is number 22. It is a completely separate object from the rest of the file's text, and it also is the only one that uses an un-embedded Helvetica font.

So let's extract that separately from the original file:

qpdf --show-object=22 --filtered-stream-data full_template.pdf 

 q
 /DeviceRGB cs
 0.000 0.000 0.000 scn
 /DeviceRGB CS
 0.000 0.000 0.000 SCN
 1 w
 0 J
 0 j
 [ ] 0 d

 BT
 286.55 797.384 Td
 /F3.0 12 Tf
 [<54> 120 <657374>] TJ
 ET

 Q

OK, here the piece [(T) 120 (est)] TJ appears as [<54> 120 <657374>] TJ. We verify this with the help of the ascii command, that prints us a nice ASCII <-> Hex table. That table confirms:

T  54
e  65
s  73
t  74

What do the other operators mean? We look them up in the official ISO 32000 PDF-1.7 spec, Annex A, "Operator Summary". Here we find the following bits of info:

 q   : gsave
 Q   : grestore
 cs  : setcolorspace for nonstroking ops
 CS  : setcolorspace for stroking ops
 scn : setcolor for nonstroking ops
 SCN : setcolor for stroking ops
 w   : setlinewidth
 j   : setlinejoin
 J   : setlinecap
 d   : setdash
 BT  : begin text object
 Td  : move text position
 Tf  : set text font and size
 TJ  : show text allowing individual glyph positioning
 Tj  : show text
 ET  : end text object

Nothing suspicious so far...

However, looking at the other object where the original page content appears in, object number 5, we discover a difference. For example:

1 0 0 -1 -17.2308 -13.485 Tm
<0013001c001200130018001200140015> Tj

Here, before each single action of a Tj (show text) the Tm operator (What is this?!?) is in play. Let's also look up Tm in the PDF spec:

 Tm  : set text matrix and text line matrix

What is strange however, is that this matrix uses 1 0 0 -1 (instead of the more common 1 0 0 1). This leads to the up-side down mirroring of the text.

Wait a minute!?!

The original text content is stroked with a mirroring text matrix, but still appears normal?? But your added text doesn't use any text matrix of its own, but appears mirrored? What is going on?!

I'm not going to trace it down for more now. My assumption is however, that somewhere in the guts of the original PDF, the authoring software defined an 'extended graphics state' which causes all stroking operations to be mirrored by default.

It seems you've done nothing wrong, Sebastien -- you've just been unlucky with your choice of a test object, and got blessed with a rather weird one. Try it continue your 'Prawn' experiments with some other PDFs first...

One can "fix" your full_template.pdf by replacing this line in qdf---test.pdf:

286.55 797.384 Td

by this one:

1 0 0 -1 286.55 797.384 Tm

and then run a last qdf command to fix the (now corrupted by our editing) PDF cross-reference table and stream lenghts:

qpdf qdf---test.pdf full_template---fixed.pdf

The console output will show you want it does:

  WARNING: qdf---test.pdf: file is damaged
  WARNING: qdf---test.pdf (file position 151169): xref not found
  WARNING: qdf---test.pdf: Attempting to reconstruct cross-reference table
  WARNING: qdf---test.pdf (object 8 0, file position 9072): attempting to recover stream length
  qpdf: operation succeeded with warnings; resulting file may have some problems

The "fixed" PDF will show the text un-mirrored.

Bathy answered 23/8, 2012 at 10:6 Comment(9)
Oh, so thank you for your full answer. I will try to understand everything, it seems to be very complex for me, i never used a lot of pdf. I will ask you if i got any questions.Cockcroft
I just want to thank you one more time, very nice answer !!!! And you was right, my pdf was corrupted. I tried with an other one, perfect !!!Cockcroft
Did you have an idea, wich command can i run to fix this problem on each pdf my users upload ?Cockcroft
I wouldn't say your initial PDF is corrupted. It's construction is completely 'legal', but... weird.Bathy
Do you have a twitter to contact you and to help me, i have some questions about pdf. Is it possible? Me, it's @sebfie.Cockcroft
@Sebastien: No, sorry, I'm not on twitter.Bathy
So, can you give me an other way to contact you?Cockcroft
Thanks @KurtPfeifle, that was a very helpful investigation. Now I just need to figure out how to support these transformations in the prawn and prawn-templates libraries.Panicle
FWIW, this issue happens for all PDFs exported (printed) by Google Chrome, and downloaded from Google Docs. I'm surprised it hasn't received more attention.Panicle
P
1

My Pull Request has been merged, so the issue is now fixed in the prawn-templates gem. The fix was to reset the graphics state before adding any content to the PDF.

This was happening because Google Chrome and Google Docs export PDFs with a transformation matrix that vertically flips all of the content. By default, PDFs are rendered from the bottom left corner. Google's custom transformation means that they can calculate coordinates from the top-left corner of the PDF, which does make more sense to me.

P.S. Thanks very much to @KurtPfeifle for the very helpful answer! I wouldn't have got this far without that information.

Panicle answered 3/10, 2017 at 10:57 Comment(2)
Thanks for the "thanks".... Also, now that you fully solved the original problem, @Cockcroft should award the "Accepted" flag to this answer.Bathy
@ManuelMeurer Oh good point, not sure why I posted that it was merged! Maybe I was thinking of a different PR. But thanks for your comment, hopefully it will get more attention!Panicle

© 2022 - 2024 — McMap. All rights reserved.