Preflight program for PDFs using PoDoFo or anything else open source? [closed]
Asked Answered
L

3

5

I have to automate a preflight check on PDF documents. The preflight consists of:

  1. Detect the resolution of images in an existing document and change them to 300dpi if they are not already at that resolution.
  2. Detect the colorspace of images and if not in CMYK, then convert them to CMYK using color profiles.
  3. Detect whether or not fonts are embedded in an existing PDF document, and correct this problem by substituting fonts. (or drawing font outlines — I'm not sure about this part).

Just wondering if this can be done using PoDoFo or any other open source projects out there. Or if I really need to go order some propriety software between $2K to $6K. My hosting environment is on Linux and supports PHP, Perl, Python, Ruby, Java.

Any ideas?

Lindsey answered 30/9, 2012 at 12:0 Comment(0)
I
5

I'm not aware of any ready-made Open Source software which meets your requirements.

Only a part of it could be solved by writing your own shell script (or other program).

  1. Detect resolution of images.

    Run pdfimages -list some.pdf to output a list of images contained in the PDF as well as their dimensions... seemingly. But what is not obvious about it: these dimensions are the ones of the raw image (as embedded in the PDF). This could be 720x720 pixels. However, if rendered onto a 10x10 inch square of the page this image will be 72 DPI on the page. If rendered on a 1x1 inch square, it will be 720 DPI. Both types of 'rendering' inside a PDF can be made from the same embedded raw image, and it is the context of the current 'graphic state' which determines which is applied. So to determine the actual DPI of an image as it appears on the page requires some additional PDF parsing...

    In any case, you can tell Ghostscript to re-sample images to 300 dpi, and to use a 'threshold' for this. (Ghostscript will never "upsample" an image, only downsample these which do overshoot the threshold. Upsampling almost never makes sense -- it only blows up the file size with no return in terms of higher quality.)

  2. Convert colors to colorspace CMYK using ICC profiles.

    The most recent versions of Ghostscript can do that. See also the most recent Ghostscript documentation describing its support for ICC.

  3. Embed un-embedded fonts.

    Running (and evaluating the results of) pdffonts some.pdf will show you which fonts are not embedded.

    Ghostscript can embed un-embedded fonts.

So one Ghostscript command that would cover most of your requirements is this:

gs                                     \
  -o cmyk.pdf                          \
  -sDEVICE=pdfwrite                    \
  -sColorConversionStrategy=CMYK       \
  -sProcessColorModel=DeviceCMYK       \
  -sOutputICCProfile=/path/to/your.icc \
  -sColorImageDownsampleThreshold=2    \
  -sColorImageDownsampleType=Bicubic   \
  -sColorImageResolution=300           \
  -sGrayImageDownsampleThreshold=2     \
  -sGrayImageDownsampleType=Bicubic    \
  -sGrayImageResolution=300            \
  -sMonoImageDownsampleThreshold=2     \
  -sMonoImageDownsampleType=Bicubic    \
  -sMonoImageResolution=1200           \
  -dSubsetFonts=true                   \
  -dEmbedAllFonts=true                 \
  -sCannotEmbedFontPolicy=Error        \
  -c ".setpdfwrite<</NeverEmbed[ ]>> setdistillerparams" \
  -f some.pdf

This command would downsample all images with a resolution that's higher than the double wanted resolution (*ImageDownSampleThreshold=2). Also it would apply all these settings to any input file (unless some special PDF preflighting software which would apply selective 'fixups' based on the results of 'checks' for special properties).

Lastly, I cannot see what made think you'd have to spend $2k to $6k in case you'd have to resort to closed-source, commercial preflighting software. (My favorite in this field is the very powerful callas pdfToolbox6 (which even has a version that runs as CLI on Linux) -- its basic version costs 500 €.)

Inspector answered 30/9, 2012 at 14:40 Comment(1)
why thanks so much for this info! I will try it out! As for closed-source, I was actually referring to Callas too. But the version I would need is pdfToolbox CLI for unix - and as I was told by one of the resellers: "Pdf Toolbox Server CLI (8 Instances) including first year SMA is Euro4,798.8" which makes it over $6K! Maybe I need to go back and ask for one instance..Lindsey
S
3

My background is in printing, so please keep this in mind when reading my answer. The items you propose to do seem somewhat straight forward, but when you get into the nitty gritty of it, there's a lot of print-industry knowledge that goes into these operations.

Here's some quick feedback to your bullet points:

  1. You won't want to upsample an low res image to 300 dpi as it will decrease image quality (via re-interpolation) and increase files size.

  2. You need to be careful with color conversions. There may be certain builds of RGB which you'd want to convert to black only. Or what happens if someone supplies a file which is already cmyk and tagged with the incorrect profile.

  3. Font detection - very complicated to substitute fonts. If you don't have the exact same font as the originator, you could end up with text reflow problems. To own that font, you'll have to paid for a license. You also can't convert fonts to outlines without them being embedded.

My recommendation is to look at a commercial package for preflighting. These developers have invested years into developing their programs and are experts within the field of printing. The challenging part will be finding ones that are unix based in your price range. Most are designed for Windows or Mac. Callas has a linux cl version but not at the price listed. You'd need the server version.

What type of volume are you planning to run through it?

Shanel answered 1/10, 2012 at 20:49 Comment(2)
Thanks. I've also been looking at Jfpdfprocess from Qoppa software (about $1300 to use their API), Apago Inc's PDF Enhancer, (2K for the CLI server edition), PDFLib, Enfocus Pit Stop Server ($4000), and PStill with PDFCheck. Our plans are to set up a proper web2print shop with thousands of designs for user customization. It will be heavily used.Lindsey
I do need a linux CLI version though, because we plan to automate the preflights, not work on them manually.Lindsey
H
2

Did you try Enfocus PitStop Pro? Contact their support department with your specific request. They have tons of PDF preflight examples and will be happy to help you out.

Hypaethral answered 8/10, 2012 at 7:14 Comment(1)
Can you add some more information on this application? Does it solves OP needs, it's propietary or open source?Ambivalence

© 2022 - 2024 — McMap. All rights reserved.