Determine the max resolution (DPI) on a PDF page
Asked Answered
C

1

1

I am using GhostScript.Net to rasterize PDF to page images before sending the page images to the printer. I am doing this so that I can always rasterize to 300dpi. This allows me to print the PDF in a reasonable amount of time regardless of the size of any image in the PDF (mainly scanned PDFs).

However, it strikes me that in some cases there will not be a need to rasterize as high as 300dpi. It may be possible to rasterize to 200dpi or even 100dpi depending on the content of the page.

Has anyone attempted to determine the maximum DPI for the content of a PDF page? Perhaps using iTextSharp?

My current code is this:

        var dpiList = new List<int> {50, 100, 150, 200, 250, 300, 350, 400, 450, 500};

        string inputPdfPath = @"C:\10page.pdf";
        string outputPath = @"C:\Print\";

        var lastInstalledVersion =
            GhostscriptVersionInfo.GetLastInstalledVersion(
                    GhostscriptLicense.GPL | GhostscriptLicense.AFPL,
                    GhostscriptLicense.GPL);

        var rasterizer = new GhostscriptRasterizer();

        rasterizer.Open(inputPdfPath, lastInstalledVersion, true);

        var imageFiles = new List<string>();

        for (int pageNumber = 1; pageNumber <= 10; pageNumber++)
        {
            foreach (var dpi in dpiList)
            {
                string pageFilePath = System.IO.Path.Combine(outputPath,
                    string.Format("{0}-{1}-{2}.png", pageNumber, Guid.NewGuid().ToString("N").Substring(0, 8), dpi));

                System.Drawing.Image img = rasterizer.GetPage(dpi, dpi, pageNumber);
                img.Save(pageFilePath, ImageFormat.Png);
                imageFiles.Add(pageFilePath);

                Console.WriteLine(pageFilePath);
            }
        }

        var imageCount = 0;

        var pd = new PrintDocument();
        pd.PrintPage += delegate(object o, PrintPageEventArgs args)
        {
            var i = System.Drawing.Image.FromFile(imageFiles[imageCount]);

            var pageBounds = args.PageBounds;
            var margin = 48;

            var imageBounds = new System.Drawing.Rectangle
            {
                Height = pageBounds.Height - margin,
                Width = pageBounds.Width - margin,
                Location = new System.Drawing.Point(margin / 2, margin / 2)
            };

            args.Graphics.DrawImage(i, imageBounds);
            imageCount++;
        };

        foreach (var imagefile in imageFiles)
        {
            pd.Print();
        }
Coppins answered 6/8, 2014 at 18:19 Comment(1)
In case of scanned PDFs, each page one image, you might consider extracting those images and scaling them if need be. The result should be better than rasterizing the pages.Otranto
P
1

PDF pages don't have a resolution. Images within them can be considered to have a resolution, which is given by the width of the image on the page, divided by the number of image samples in the x direction, and the height of the image on the page divided by the number of image samples in the y direction.

So this leaves calculating the width and height of the image on the page. This is given by the image matrix, modified by the Current Transformation Matrix. So in order to work out the width and height on the page, you need to interpret the content stream up to the point where the image is rendered, tracking the graphics state CTM.

For general PDF files, the only way to know this is to use a PDF interpreter. In the strictly limited case where the whole page content is a single image you can gamble that there is no scaling taking place and simply divide the media width by the image width, and the media height by the image height to give the x and y resolutions.

However this definitely won't work in the general case.

Pitfall answered 6/8, 2014 at 19:9 Comment(4)
Yes. Furthermore the CTM is not restricted to scaling but instead may also skew and rotate the image. This makes the notion of dpi somewhat meaningless.Otranto
@mkl: My guess is that 99% of images on PDF pages are neither skewed nor rotated. So in 99% of cases it is still meaningful to calculate PPI/DPI (separately for X and Y directions, as about 10% of images are scaled without preserving aspect ratio).Bordelaise
@KurtPfeifle 99% - Still one should tell the OP. If he is sure that such images won't be an issue for him, so much the better. But probably he is in a situation were a 100% correct rate is required. Or in a situation with PDFs which play around with skewed and rotated images a lot...Otranto
@mkl: Sure, you are right about pointing to skews and rotations :-) I'm just trying to put it into a (personlly colored) perspective. :-)Bordelaise

© 2022 - 2025 — McMap. All rights reserved.