How do you debug PDF files? [closed]
Asked Answered
P

8

36

Many times I create a PDF either programmatically and there might be a problem with it, e.g. some specific letter might no show up well or I might have encoding issues etc.

Is there some way to debug a PDF? E.g. see it's detailed structure?

Pharmacology answered 23/12, 2010 at 23:42 Comment(2)
Also see superuser.com/q/256997/78897Evermore
mutool will output diagnostic messages as it tries to repair, e.g., name too long (which in my case helped me narrow down the problem of the corruption on my latex ubuntu system).Beachhead
R
17

There are a number of free tools that'll let you look at the guts of a PDF, uncompressed and decrypted (given the password).

RUPS for iText springs to mind (but I'm biased). I don't know that there's an iTextSharp equivalent. It's a GUI with a tree view (something ALL these apps have) of the PDF objects.

Some will let you edit the PDF within that tree, but not many. I believe Windjack's PDF CanOpener will (along with several other spiffy features you'd expect from a commercial Acrobat plugin).

And in a pinch, <insert favorite text editor here> works... but don't try to change anything. PDF is a binary format: byte offsets are important. If your text editor changes the \n to a \r\n (or tries to interpret it as UTF-8, or, or, or), your PDF will be Horribly Broken. Don't do that.

I end up doing a lot of searching for a given object number to look up indirect references. It's always a pain to look up a single digit reference because "4 obj" shows up at the end of every tenth object (14, 24, 34, 1234, etc). A regex search that looked for "beginning of line-4 obj-end of line" would be great, but I generally use notepad, so that's out (and I'm not much of a regex guy anyway).

PS: Even with a spiffy Acrobat plugin(not can opener, home grown from way back), I still need to crack open a text editor from time to time.

Acrobat will make changes at times as it loads a PDF (mostly to fix things), and if you want to know What's Really There, you need to look at that PDF in some other way. And when you're trying to debug a broken PDF, acrobat being helpful is the last thing you need.

PPS: Acrobat also has a spiffy "pdf syntax check" in its advanced->preflight profiles. It's also got checks for various PDF/* standards (PDF/X, PDF/A-1 [a and b], etc), accessibility, and so forth. They're invaluable when you're trying to Be Compliant. Not quite the debugging tool you were asking about, but Very Handy none the less.

PPPS: "diff"ing two PDFs is all but impossible, without writing a custom tool to do it for you. I wrote something that listed all the pages (with sizes) and fields (with types, flags, etc) in a predictable order and dumped it to a text file so I could diff the files... but directly diffing two PDFs is pointless. There are too many ways for "identical" files to differ (object order, dictionary key order, compression levels, etc).

Ruggles answered 27/12, 2010 at 18:8 Comment(5)
I'm looking at RUPS at github.com/itext/rups (It's one of the few options I can find for Mac), but it seems more like a library than a stand-alone application. (It doesn't call LicenseKey.loadLicenseFile() anywhere.) From your description, I thought it was stand-alone, provided I had an iText license. What am I missing?Marileemarilin
Email from iText says specifically there's no GUI, contradicting your second paragraph. Perhaps it used to offer a GUI? I've seen screen shots and demos, but nothing current? If you have RUPS site offering the "GUI with a tree view" I'd love a link.Marileemarilin
packages.debian.org/sid/java/libitext-rups-java promises a GUI, but I'm not on a Debian system, and I'm unclear if it's the same iText RUPS as my initial find at github.com/itext/rupsMarileemarilin
@SarahMesser RUPS comes with a GUI. You can download it from github.com/itext/rups/releases, the zip contains several jars (and also an exe, which I haven't tried). To run it, use: java -jar itext-rups-5.5.9-jar-with-dependencies.jar.Sciamachy
@SarahMesser you also have RUPS packaged in flathub, so easy to install no matter your Linux distro: flathub.org/apps/details/com.itextpdf.RUPSGuidotti
K
6

Well, I wanted to debug some PDF files that I was generating using pdfLaTeX the other day, and I found that Adobe [Acrobat] Reader was not very helpful, as the slightly invalid PDFs I was producing would open as if there was no problem, they only failed to close. This made the TeX/View/Edit cycle a bit of a pain, since I would have to terminate the entire Reader process before I could TeX again.

I achieved more favorable results using Ghostscript. In my case, this was by way of GSview since I was using Windows; if I had been using Linux, I would have used gv instead. Not only did this not prevent me from re-TeXing the file (even while it was still open), it was nice enough to produce nigh-incomprehensible error messages rather than pretending everything was okay. These enabled me, with a bit of squinting, to see what I'd messed up in my PDF code and finally to produce the example given in this tex.SE answer of mine

It would have been nice if I could have figured out how to tell Ghostscript to include slightly more detail in the error message (well, I probably could have, if I'd looked at the right part of the manual for long enough, actually), but it wasn't that hard to figure out what I'd messed up by comparing the PDF with the Ghostscript error message and with Adobe's PDF reference. (I link to the archive page because the PDF references there were produced entirely by Adobe, and are of much higher typographic quality as well as much smaller size than the ISO standard for PDF that is on the main page.)

Of course, in order to make any sense of it in your text editor, it will probably be important that the page streams not be compressed, so I would suggest you figure out how to instruct your software not to compress them, or find something with which to uncompress them again afterwards.

So, in short:

  1. Don't use Adobe [Acrobat] Reader (until you think your PDF is good, anyway).

  2. Do use Ghostscript (typically through GSview or gv).

  3. Do try to instruct your software to refrain from compressing page streams.

  4. Do use a text editor to look at the PDF (preferably set to "PostScript" mode, as the syntax is closely related).

  5. Do use the PDF reference.

Kolyma answered 28/12, 2010 at 4:47 Comment(0)
V
5

You can see the structure of a PDF using a tool like CanOpener, PDFedit or Acrobat (I wrote a blog article on the subject at http://www.jpedal.org/PDFblog/2010/09/useful-pdf-tools-pdfedit/)

Vasily answered 24/12, 2010 at 12:31 Comment(1)
...though I don't use it myself. I have no affiliation with Windjack (the company that sells it). Lots of slick little features if you have Acrobat.Ruggles
G
5

This is what I usually do in Linux:

Guidotti answered 4/11, 2018 at 14:4 Comment(0)
V
3

How about http://blog.didierstevens.com/programs/pdf-tools/ or http://podofo.sourceforge.net/about.html

For a list of PDF tools and libraries - http://en.wikipedia.org/wiki/List_of_PDF_software You may find other tools there that fit your needs.

Verrocchio answered 24/12, 2010 at 1:26 Comment(0)
C
1

another tool would be pdfstreamdumper
https://github.com/dzzie/pdfstreamdumper

its actually quite intuitive to go through
made to analyze javascript / as3 code etc has built in quite a few things
(hexviewer / refactor (deobfucators) etc)

Creatine answered 3/9, 2014 at 13:35 Comment(0)
S
1

You can also use PDFBox jar to debug a pdf file:

java -jar pdfbox-app.*.jar PDFDebugger file.pdf
Stargell answered 30/1, 2023 at 13:22 Comment(0)
G
-1

Just open it in some text editor. PDF is actually an ASCII file (and it can contain embedded binary data).

Genius answered 23/12, 2010 at 23:54 Comment(4)
As far as I know, most PDFs have packed page streams nowadays and you probably can see nothing useful in a text editor when open such a file.Baines
Just try to open some of them. Or you want me to send some PDF to you - you'd be surprised :)Genius
I wanted to say - a guy has asked a question, and it doesn't hurt to try to open a file in the text editor :) Maybe the file can be read in it; I have a lot of such files :)Genius
Adobe is down right retentive about using object streams. 3rd party PDF generators often default to "regular" if they support them at all. Since most folks around here aren't using the Acrobat SDK, I see text editors as a perfectly valid option. Not the only one by a long shot, but a perfectly valid one... event when you have "real" alternatives (see my answer for an explanation).Ruggles

© 2022 - 2024 — McMap. All rights reserved.