There is an example on how to find and replace images in an existing PDF by the creator of iText. It's actually a small excerpt from his book. Since it's in Java, here's a simple replacement:
public void ReduceResolution(PdfReader reader, long quality) {
int n = reader.XrefSize;
for (int i = 0; i < n; i++) {
PdfObject obj = reader.GetPdfObject(i);
if (obj == null || !obj.IsStream()) {continue;}
PdfDictionary dict = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName subType = (PdfName)PdfReader.GetPdfObject(
dict.Get(PdfName.SUBTYPE)
);
if (!PdfName.IMAGE.Equals(subType)) {continue;}
PRStream stream = (PRStream )obj;
try {
PdfImageObject image = new PdfImageObject(stream);
PdfName filter = (PdfName) image.Get(PdfName.FILTER);
if (
PdfName.JBIG2DECODE.Equals(filter)
|| PdfName.JPXDECODE.Equals(filter)
|| PdfName.CCITTFAXDECODE.Equals(filter)
|| PdfName.FLATEDECODE.Equals(filter)
) continue;
System.Drawing.Image img = image.GetDrawingImage();
if (img == null) continue;
var ll = image.GetImageBytesType();
int width = img.Width;
int height = img.Height;
using (System.Drawing.Bitmap dotnetImg =
new System.Drawing.Bitmap(img))
{
// set codec to jpeg type => jpeg index codec is "1"
System.Drawing.Imaging.ImageCodecInfo codec =
System.Drawing.Imaging.ImageCodecInfo.GetImageEncoders()[1];
// set parameters for image quality
System.Drawing.Imaging.EncoderParameters eParams =
new System.Drawing.Imaging.EncoderParameters(1);
eParams.Param[0] =
new System.Drawing.Imaging.EncoderParameter(
System.Drawing.Imaging.Encoder.Quality, quality
);
using (MemoryStream msImg = new MemoryStream()) {
dotnetImg.Save(msImg, codec, eParams);
msImg.Position = 0;
stream.SetData(msImg.ToArray());
stream.SetData(
msImg.ToArray(), false, PRStream.BEST_COMPRESSION
);
stream.Put(PdfName.TYPE, PdfName.XOBJECT);
stream.Put(PdfName.SUBTYPE, PdfName.IMAGE);
stream.Put(PdfName.FILTER, filter);
stream.Put(PdfName.FILTER, PdfName.DCTDECODE);
stream.Put(PdfName.WIDTH, new PdfNumber(width));
stream.Put(PdfName.HEIGHT, new PdfNumber(height));
stream.Put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
stream.Put(PdfName.COLORSPACE, PdfName.DEVICERGB);
}
}
}
catch {
// throw;
// iText[Sharp] can't handle all image types...
}
finally {
// may or may not help
reader.RemoveUnusedObjects();
}
}
}
You'll notice it's only handling JPEG. The logic is reversed (instead of explicitly handling only DCTDECODE
/JPEG) so you can uncomment some of the ignored image types and experiment with the PdfImageObject
in the code above. In particular, most of the FLATEDECODE
images (.bmp, .png, and .gif) are represented as PNG (confirmed in the DecodeImageBytes
method of the PdfImageObject
source code). As far as I know, .NET does not support PNG encoding. There are some references to support this here and here. You can try a stand-alone PNG optimization executable, but you also have to figure out how to set PdfName.BITSPERCOMPONENT
and PdfName.COLORSPACE
in the PRStream
.
For completeness sake, since your question specifically asks about PDF compression, here's how you compress a PDF with iTextSharp:
PdfStamper stamper = new PdfStamper(
reader, YOUR-STREAM, PdfWriter.VERSION_1_5
);
stamper.Writer.CompressionLevel = 9;
int total = reader.NumberOfPages + 1;
for (int i = 1; i < total; i++) {
reader.SetPageContent(i, reader.GetPageContent(i));
}
stamper.SetFullCompression();
stamper.Close();
You might also try and run the PDF through PdfSmartCopy to get the file size down. It removes redundant resources, but like the call to RemoveUnusedObjects()
in the finally
block, it may or may not help. That will depend on how the PDF was created.
IIRC iText[Sharp] doesn't deal well with JBIG2DECODE
, so @Alasdair's suggestion looks good - if you want to take the time learning the Jasper library and using the brute-force approach.
Good luck.
EDIT - 2012-08-17, comment by @Craig:
To save the PDF after compressing the jpegs using the ReduceResolution()
method above:
a. Instantiate a PdfReader
object:
PdfReader reader = new PdfReader(pdf);
b. Pass the PdfReader
to the ReduceResolution()
method above.
c. Pass the altered PdfReader
to a PdfStamper
. Here's one way using a MemoryStream
:
// Save altered PDF. then you can pass the btye array to a database, etc
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
}
return ms.ToArray();
}
Or you can use any other Stream
if you don't need to keep the PDF in memory. E.g. use a FileStream
and save directly to disk.
MemoryStream
and then back to the document as appending it or adding a page with thatStream
correct? – Associate