Processing PDFs to reduce file size / and or complexity
Asked Answered
L

6

12

I have PDF files I need to prepare for viewing on mobile devices. The worse case would be ~50 pages, with lots full color images and vector art, file size approx. 40MB. This is acceptable for PC viewing on broadband, but not great for mobile viewing due to long download times and very laggy scrolling on mobile (At least on my overclocked Droid). Are there any tools or libraries for processing the files to simply the vector stuff, downsample/recompress the images, that sort of thing?

Output in pdf format is not absolutely essential, but it needs to be something readable on android and iOS devices without software downloads.

Lordly answered 31/12, 2010 at 19:33 Comment(3)
Do you have control over the source documents? I would think PDF is going to be a real PAIN to work with if you have to manipulate images.Quiroz
Not really. They come out of our publishing system.Lordly
Have you looked at PDF optimizer (help.adobe.com/en_US/Acrobat/8.0/Professional/…)?Dar
H
8

There are a few main things that can blow up the size of a PDF on mobile devices:

  • hi-resolution pictures (where lo-res would suffice)
  • embedded fonts (where content would still be readable "good enough" without them)
  • PDF content not required any more for the current version/view (older version of certain objects)
  • embedded ICC profiles
  • embedded third-party files (using the PDF as a container)
  • embedded job tickets (for printing)
  • embedded Javascript
  • and a few more

FOSS software: Ghostscript can try to size down your PDFs, mainy be re-sampling the pictures used and by removing older versions ("generations") of PDF objects which were replaced by newer ones:

gswin32c.exe ^
  -o sized-down.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/ebook ^
  -dEmbedAllFonts=false ^
  -c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
  -f blown-up.pdf

You can add more parameters to above commandline to size down certain PDFs even more (f.e. by setting a lower max resolution, etc.) Here is an example to enforce a downsampling for color and grayscale images to 72dpi:

gswin32c.exe ^
  -o sized-down.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/ebook ^
  -dEmbedAllFonts=false ^
  -dColorImageDownsampleThreshold=1.0 ^
  -dColorImageDownsampleType=/Average ^
  -dColorImageResolution=72 ^
  -dGrayImageDownsampleThreshold=1.0 ^
  -dGrayImageDownsampleType=/Average ^
  -dGrayImageResolution=72 ^
  -c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
  -f blown-up.pdf

Commercial+closed source software: callas pdfToolbox4 is able to reduce file sizes even more by applying a custom profile to the PDF downsizing process (it can even un-embed fonts and ICC profiles).


Update 2: See also the following (new) question with the answer:

It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged. This is useful in cases where you do not want the (raster) images, but only the text parts in order to reduce file size.

Haymaker answered 2/1, 2011 at 18:35 Comment(0)
D
4

Adobe Acrobat Professional has two built-in tools for optimizing PDF files:

"PDF Optimizer" - http://www.adobe.com/designcenter/acrobat/articles/acr7optimize.html, which will simplify vectors and removed unneeded content (among other things)

and

"Optimize Scanned PDF" -http://help.adobe.com/en_US/Acrobat/9.0/Standard/WS58a04a822e3e50102bd615109794195ff-7f71.w.html#WS0BEFAC0B-47D9-47b8-9AF8-4DE2FE9C9736.w, which will downsample and compress embedded raster images.

Both are the best tools for what they do that I have used. However, the focus of most PDF optimization tools is to reduce file size not improve rendering speed.

If you want to drastically improve rendering performance on your device you should consider pre-rendering the PDFs to bitmap images. If you scale them up a bit before rasterizing (to allow for on-device zooming) and stick to an indexed color scheme you should be able to produce rasters for each page that are an acceptable file size and resolution. They will draw much more quickly on the device than vector content would.

Diphyodont answered 10/1, 2011 at 15:10 Comment(0)
B
0

There are options in Acrobat to reduce image size and improve PDF filesize/speed. Have you looked this option?

Bernardinebernardo answered 2/1, 2011 at 17:37 Comment(2)
These PDFs are generated by a proprietary publishing system, and manual intervention isn't practical due to the volume of files we're dealing with.Lordly
You might also want to see if they actually being created for print. We did a lot of work on PDF files with various publshers and the PDF files produced were CMYK which is going to be slow/large.Bernardinebernardo
A
0

Are you planning on the user having the PDF files stored on their phone for viewing offline? If not, could you batch convert the PDF files into HTML? You could also post-process any images to lower the quality/filesize.

Some options for converters include:

I'm sure there are even more options for performing the conversion.

As an outside bet, have you tried viewing your PDF's from your phone using the google online reader?

Accra answered 7/1, 2011 at 16:21 Comment(0)
C
0

Some time ago (a few years) I used to reduce the size of PDFs by converting them to djvu (say, through http://any2djvu.djvuzone.org/ or the locally-installed free command-line tools). The results were very nice (small).

At that time, AFAIK, PDF didn't include the support for encodings of the same efficiency in size as djvu, but now I have been told that the PDF format has included the encodings that are as good as djvu. So, there must be tools that do a similarly good optimization for PDF. Look for them.

Or you could distribute djvus, but I'm not sure djvu-reading software is pre-installed in your OSes.

Carnassial answered 11/1, 2011 at 1:10 Comment(0)
K
0

it needs to be something readable on android and iOS devices without software downloads.

You can pre-process your PDF with a tool like k2pdfopt.

It changes this ===================> to this:

https://static.mcmap.net/file/mcmap/ZG-AbGLDKwf1c1bQcRlsaGyxKmMva3/k2pdfopt/examples/original/ieee_twocolumn_template.png ______ https://static.mcmap.net/file/mcmap/ZG-AbGLDKwf1c1bQcRlsaGyxKmMva3/k2pdfopt/examples/kindle/ieee_twocolumn_template_k2opt_v127.png

From its sources, the project started in 2012.

Kinzer answered 21/8, 2019 at 16:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.