How to check PDF pages for resolution (DPI) of embedded images?
Asked Answered
C

2

1

Is there any free library, that can be used to get resolution of images in DPI contained by PDF file?

I've tried the following code, using PDFSharp but the DPI it returns is not correct. For example it shows 96dpi while it should be 150dpi:

using (PdfDocument pdf = PdfReader.Open(sourcePdf))
{
    for (int i = 0; i < pdf.Pages.Count; i++)
    {
        XGraphics xGraphics = XGraphics.FromPdfPage(pdf.Pages[i]);
        float dpi = xGraphics.Graphics.DpiX; 
    }
}
Centreboard answered 14/1, 2015 at 8:25 Comment(4)
possible duplicate with answer: #25167875Fjord
Your code does not access any image. To get the DPI of an image, you first have to locate the image. PDFsharp was not designed for that kind of task because PDFsharp cannot render PDF files.Desouza
Ok, so it's impossible to get DPI using PDFSharp? if so, what can I use instead? Unfortunatelly I can only use such libraries that are free for commercial use...so I can't use ItextSharp :(Centreboard
I don't understand what you try to achieve. A single image in a PDF can be drawn multiple times in the PDF file with different DPI. And with transformations the top of the image may have a different DPI than the bottom. What's the purpose of getting the DPI value? Should it work with any PDF file or only with PDF files created by a specific application?Desouza
E
4

You can use a command line tool to get the info you need: pdfimages.

However, you need a recent version pdfimages that is based on the Poppler library (NOT the 'pdfimages' that is based on XPDF!)

Recent Poppler versions let you use the -list option:

pdfimages -list -f 2 -l 4 my.pdf

The output of above example command shows all images in the page range from 2 (f irst page to show) to 4 (l ast page to show).

Here is the output for the above command, using an example PDF file I prepared specifically for this question (scroll horizontally to see all columns):

page num  type width height color comp bpc  enc interp object ID x-ppi y-ppi size ratio
---------------------------------------------------------------------------------------
   2   0 image   697  1238  gray    1   8  jpeg   no       16  0   320   320  142K  17%
   3   1 image   697  1238  gray    1   8  jpeg   no       16  0   151   151  142K  17%
   4   2 image   697  1238  gray    1   8  jpeg   no       16  0    84   115  142K  17%

The output shows the following:

  1. There are three images on the three pages 2-4 (as indicated by columns 1+2, headed page and num).

  2. The PDF object IDs for all three images are identical: 16 0 (as indicated by columns 11+12, headed object + ID). This means the PDF has only one distinct object defined, but showing it three times (i.e., the image is embedded only once, but appears on 3 pages).

  3. The image's width is 697 pixels, its height is 1238 pixels, its image depth (bits per color) is 8, its colorspace is gray its number of color channels/components is 1, its compression scheme is jpeg, its bytesize (as embedded) is 142K, its compression rate is 17% (as indicated by columns 4-9 and 14+15 headed width, height, color, comp, bpc, size and ratio).

  4. However, the same image appears on different pages in different resolutions (given as PPI -- pixels per inch --- not DPI):

    • page 2 shows it with a PPI of 320 in both directions,

    • page 4 shows it with a PPI of 151 in both directions,

    • while page 3 shows it with a PPI of 84 in horizontal (X) direction and 115 PPI in vertical (Y) direction.


Now, if a command line tool cannot be re-purposed for your goal: the Poppler library which is the base for the tool shown above certainly is Free ('free as in liberty', as well as 'free as in beer').


Here is a link to the PDF ("my.pdf") I used to demonstrate the output of the command above.

Equestrienne answered 14/1, 2015 at 12:3 Comment(0)
F
1

PDF's do not necessarily use DPI in their definitions. PDF's allow the document creator to define their own user coordinate space which may or may not map to anything similar to Dots Per Inch.

From here:

Fjord answered 14/1, 2015 at 8:30 Comment(4)
The embedded images still have a definitive size, both in terms of how much of the page they take up and how large their pixel dimensions are. Which means you can calculate dpi from that.Waterer
Ok, I can get width and height of pictures but where to find pixel dimension?Centreboard
@Waterer The embedded images still have a definitive size - a not necessarily a single one; the same image resource can be used multiple times at different scales. b Furthermore the images may not only be scaled but also rotated and skewed. What should that mean in terms of dpi?Richburg
This answer is mostly incorrect. I'll not downvote it though -- just think about it and take appropriate action (I'd recommend to improve/modify your answer, not deleting it). Look at the example I gave, please!Equestrienne

© 2022 - 2024 — McMap. All rights reserved.