Does iTextSharp Handle PDF Compression?
Asked Answered
B

2

8

Can iTextSharp compress PDF files? I am looking for a PDF library that can be used in development to compress PDF files. Essentially, I have a list of folders that contain many PDF files ranging from 1MB to 10MB in size, and the quantity of these folders keeps growing every day, so to save disk space I would like to be able to read in a PDF file once it has been processed, compress it, then save it to the designated folder location.

If iTextSharp does not support compression, does anyone have suggestions for other .NET PDF libraries that could? Purchasing a library wouldn't be a problem. I looked at many of the free ones, such as PDFSharp, which is very good in my opinion at making PDFs, but cannot render or compress them.

There is a great answer I read on stackoverflow from Chris Haas:

PdfStamper is a helper class that ultimately uses another class called PdfStamperImp to do most of the work. PdfStamperImp is derived from PdfWriter and when you use stamper.Writer you are actually getting back this implementation class. Many of the properties on PdfStamper also pass directly through to the implementation class. So these two calls actually do the same thing.

stamper.SetFullCompression();
stamper.Writer.SetFullCompression();

Another point of confusion is that SetFullCompression and the CompressionLevel aren't actually related at all. "Full compression" represents a feature that was added in PDF 1.5 called "Object Streams" that allows grouping PDF objects together to potentially allow for greater compression. There's actually no requirement that what we think of as "compression" actually occurs but in reality I think it would always happen. (Possibly a super simple document might get larger with this enabled, not sure and don't feel like testing.)

The CompressionLevel is actually what you normally think of as compression, a number from 0 to 9 or -1 to mean default (which currently equals six I think). This property is actually part of the PdfStream class which many classes ultimately derive from. This setting doesn't "trickle down", however. Since you are importing a stream from another location via GetPageContent() and SetPageContent() that specific stream has its own compression settings unrelated to the Writer's compression settings. There's actually a third parameter that you can pass to SetPageContent() to set your specific compression level if you want.

reader.SetPageContent(1, reader.GetPageContent(1), PdfStream.BEST_COMPRESSION);

https://mcmap.net/q/1327937/-itextsharp-returning-same-size-pdf-when-i-39-m-trying-to-compress-pdf-file-with-different-levels

Any help or suggestions will greatly be appreciated.

Thank you.

Birth answered 19/5, 2016 at 13:33 Comment(1)
Have you tried the suggestion from @Chris' answer? Has its compression effect not been good enough?Camaraderie
B
6

Yes, iText and iTextSharp support compression.

  • From PDF 1.0 (1993) to PDF 1.1 (1994), PDF syntax stored in content streams wasn't compressed.
  • From PDF 1.2 (1996) on, PDF syntax stored in content streams could be compressed. The standard filter is /FlateDecode. This algorithm is similar to the ZIP algorithm and you can set different levels of compression (from 0 to 9; where choosing -1 will use whatever your programming language considers being the default).
  • From PDF 1.5 (2003) on, the indirect objects can be stored in a compressed object stream. Additionally, the cross-reference table can be compressed and stored in a stream. Before PDF 1.5, this wasn't possible (viewers that only support PDF 1.4 and earlier can't open "fully compressed" PDFs).

iText supports all of the above and Chris' answer already fully answers your question. Since PDF 1.1 dates from a really long time ago (1994), I wouldn't worry about changing the compression levels of content streams, so you can safely forget about:

reader.SetPageContent(1, reader.GetPageContent(1), PdfStream.BEST_COMPRESSION);

Using this line won't reduce the file size much.

Using "full compression" (which will cause the cross-reference table to be compressed) should have an effect on the file size for PDFs with many indirect objects. A minimal "Hello World" file could increase in file size when you use "full compression".

All of the above won't help you much, because good PDF creators already compress whatever can be compressed. Bad PDF creators however (or people using good PDF creators incorrectly) could contain objects that are redundant. For instance: there are people who don't know how to add a logo as an image to each page in a PDF using iTextSharp. Because of their ignorance, they add the image as many times as there are pages. PDF compression won't help you in this case, but if you pass such a "bad" PDF through iTextSharp's PdfSmartCopy, then PdfSmartCopy will detect the redundant objects and reorganize the file so that objects that are repeated over and over again in the file (for instance: every page refers to a different object with the same image bytes), are reused (for instance: every page refers to the same object with the image bytes).

Depending on the version of iTextSharp you're using reader.RemoveUnusedObjects(); will also help you (recent versions remove unused objects by default).

Blank answered 19/5, 2016 at 13:53 Comment(4)
Thanks @Bruno Lowagie, this is a very concise response. I'm reading versions before 5 are free but not recommended due to possible bugs and no support. I'm thinking of getting the commercial licensed 5.4 version. And unfortunately we don't have control of the PDF generator where these files are being created, so there may very well be redundancy within them. They contain lots of images and text (all of which is exported as one single image on the page within the PDF). I will look into it more and see if I can tweak it down to a reasonable file size. Thanks.Birth
Also, just to throw this out there because I deal with this at my day job, there's a third type of "compression" that iText does not directly handle, and that's lossy compression of images which will often do the most dramatic reduction of file size if you are willing to sacrifice quality. There's a very high level example of it here. Basically you use iText to find and extract all images, perform your own reduction logic and then add the images back using iText. This is a destructive change but it might be acceptable in your environment.Muchness
@ChrisHaas True, but this can be dangerous. Suppose that you have scanned text that is legible at its current resolution. If you let a machine decide whether or not to reduce the resolution, you'll never be certain if the text will still be legible after "compression". You need a human being to make that decision.Blank
@BrunoLowagie, I absolutely agree that changing an image in any way could destroy the original intention of the image and make it not usable. if the OP has a need for the exact image then this path shouldn't be pursued. However, what I deal with are PDFs coming from design programs where someone places uncompressed image and creates PDFs that are 100MB+ for just one page. For print, maybe that's fine but for review we can scan the images, find the effective DPI, drop it to something screen-worthy, apply ~60% JPEG compression and get a file that's only 2MB.Muchness
H
4

ITextSharp allows you to navigate over PDF pages and edit objects inside it (along with many other features). Compression of stream objects (mainly images) could help you to decrease overall PDF size.

I investigated deep enough about compression of PDF files, mainly images inside it, and completed with lightweight library, which could be used as a parent for your particular compression cases.

https://github.com/rock-walker/PdfCompression

Heroine answered 31/10, 2017 at 9:43 Comment(3)
This helped for me!Discomfiture
great idea but it needs a little more polish... images look totally ugly, some of them remained untouched...Windswept
you could play with parameters: compression, quality - sometimes it helpsHeroine

© 2022 - 2024 — McMap. All rights reserved.