How to figure out the resolution (DPI) of images embedded in a PDF document?
Asked Answered
V

7

14

I have a PDF document that also contains images.

Now I want to know the resolution of these images.

A first step would be to somehow get the images out of the PDF document. But how?

Is that even possible with something provided in Cocoa?

Valtin answered 24/7, 2012 at 18:22 Comment(2)
I can't swear to it, but I think that a PDF can contain images with different resolutions, so the "resolution of a PDF document" is not well-defined.Athanasia
Oh dear. I hope you are wrong. I'll create a new question for that. Thank you.Valtin
B
16

Have a look at this answer for your other question:

Basically, you can now use the (new) -list parameter for Poppler's pdfimages commandline utility (it will NOT work for XPDF's version of pdfimages!).

It will report the dimensions of each image appearing on the queried pages.

(You can also use it to extract images from a PDF: pdfimages -png -f 3 -l 5 some.pdf prefix--- will extract all images as PNGs from the PDF file, starting with first page 3 and ending with last page 5, using a filename prefix of prefix--- for each image. But this problem seems to not be the main focus of your question...)

Example:

pdfimages -list -f 1 -l 3 /Users/kurtpfeifle/Downloads/ct-magazin-14-2012.pdf

  page   num  type   width height color comp bpc  enc interp  object ID
  ---------------------------------------------------------------------
     1     0 image    1247  1738  rgb     3   8  jpx    no      3053  0
     2     1 image     582   839  gray    1   8  jpeg   no      2080  0
     2     2 image     344   364  gray    1   8  jpx    no      2079  0
     3     3 image     581   838  rgb     3   8  jpeg   no         7  0
     3     4 image    1088   776  rgb     3   8  jpx    no         8  0
     3     5 image       6     6  rgb     3   8  image  no         9  0
     3     6 image       8     6  rgb     3   8  image  no        10  0
     3     7 image       4     6  rgb     3   8  image  no        11  0
     3     8 image     212   106  rgb     3   8  jpx    no        12  0
     3     9 image     150    68  rgb     3   8  jpx    no        13  0
     3    10 image       6     6  rgb     3   8  image  no        14  0
     3    11 image       4     4  rgb     3   8  image  no        15  0

It does not directly report the DPI resolution -- but from the 'width' and 'height' dimensions you can calculate it easily: you measure the width of the picture on your screen with an inch ruler and then divide the 'width pixels' by the measured ruler number...

You find this strange, because the result is dependent on your current zoom level? Yes, it is!

The concept of the 'resolution' is always dependent on the environment. A so-called 'hi-res' picture basically always has lots of pixels in width and height. This allows for better quality (or 'resolution') if the picture needs to be displayed or printed with higher zoom levels.


Update

Meanwhile there is a new version of (Poppler's) pdfimages:

$  pdfimages -version
  pdfimages version 0.33.0
  [....]

This reports the resolution of embedded images as well, in PPI (pixels per inch), in horizontal (x-ppi) and vertical (y-ppi) directions:

page num  type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
-------------------------------------------------------------------------------------
   1   0 image  1247  1738  rgb     3   8  jpx    no    3053 0   151   151  228K 3.6%
   2   1 image   582   839  gray    1   8  jpeg   no    2080 0    72    72  319B 0.1%
   2   2 image   344   364  gray    1   8  jpx    no    2079 0   150   150 4325B 3.5%
   3   3 image   581   838  rgb     3   8  jpeg   no       7 0    73    73 1980B 0.1%
   3   4 image  1088   776  rgb     3   8  jpx    no       8 0   150   151  106K 4.3%
   3   5 image     6     6  rgb     3   8  image  no       9 0   150   150  108B 100%
   3   6 image     8     6  rgb     3   8  image  no      10 0   150   150  158B 110%
   3   7 image     4     6  rgb     3   8  image  no      11 0   150   150   73B 101%
   3   8 image   212   106  rgb     3   8  jpx    no      12 0   150   150 2396B 3.6%
   3   9 image   150    68  rgb     3   8  jpx    no      13 0   150   150 1878B 6.1%
   3  10 image     6     6  rgb     3   8  image  no      14 0   150   150   81B  75%
   3  11 image     4     4  rgb     3   8  image  no      15 0   150   150   50B 104%

This new feature appeared first in Poppler version 0.25 (released Wed December 11, 2013). It additionally reports...

  • ...(file) sizes and
  • ...(compression) ratios

...of embedded images.

Limitations of pdfimages -list

Perhaps I should also make you aware of the limitations of the pdfimages utility, and give an example where its output report is not completely correct.

One example is this handcoded PDF from my (recently created) GitHub repository of PDFs to help beginners to study the syntax of PDF source code.

I originally created this PDF in order to demonstrate a bug with Mozilla's PDF.js renderer. Here is a screenshot about how it looks in PDF.js (left) and how it should look when rendered correctly (right, rendered by Ghostscript and Adobe Reader):

 

(Right-click on each of above images. Select "Open image in new tab" to see the exact differences...")


The PDF file contains a 2x2 pixels image, embedded only once (with object ID 5 0), but displayed on the page multiple times with different settings, where each time the image is placed...

  • ...at a different position,
  • ...with a different scaling,
  • ...with a different rotation,
  • ...even with a different skew.

Under these extreme circumstances pdfimages -list falls flat on its nose when trying to determine some of the resolutions for instances of this image:

page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
   1   0 image    2     2  rgb     3   8 image  no        5 0     4     4   13B 108%
   1   1 image    2     2  rgb     3   8 image  no        5 0     5     3   13B 108%
   1   2 image    2     2  rgb     3   8 image  no        5 0     3     5   13B 108%
   1   3 image    2     2  rgb     3   8 image  no        5 0     6     3   13B 108%
   1   4 image    2     2  rgb     3   8 image  no        5 0     3    10   13B 108%
   1   5 image    2     2  rgb     3   8 image  no        5 0     4 72000   13B 108%
   1   6 image    2     2  rgb     3   8 image  no        5 0     4     2   13B 108%
   1   7 image    2     2  rgb     3   8 image  no        5 0     2     4   13B 108%
   1   8 image    2     2  rgb     3   8 image  no        5 0 14401     1   13B 108%
   1   9 image    2     2  rgb     3   8 image  no        5 0     1     2   13B 108%
   1  10 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
   1  11 image    2     2  rgb     3   8 image  no        5 0     4 0.950   13B 108%
   1  12 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
   1  13 image    2     2  rgb     3   8 image  no        5 0     1     4   13B 108%
   1  14 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
   1  15 image    2     2  rgb     3   8 image  no        5 0 0.950     4   13B 108%
   1  16 image    2     2  rgb     3   8 image  no        5 0     4 0.950   13B 108%

pdfimages -list gets most values correct, if there is no rotation and/or no skewing involved. It is no wonder that there are discrepancies if the image is rotated or skewed: Because how would you even reliably define an x-ppi and y-ppi value for such cases? That explains the (completely wrong) values of 72000 y-ppi for image no. 5 and 14401 x-ppi for image no. 8.

As you can easily see, pdfimages is rather clever for determining other image properties:

  1. It correctly reports the same object ID 5 0 for all instances of the displayed image, indicating that this image is embedded once, but displayed multiple times on the page.
  2. It correctly reports the image dimensions to be 2x2 pixels.
Burlingame answered 28/7, 2012 at 9:30 Comment(0)
D
6

It's not easy, but it's possible. While you cannot do it using PDFDocument, you can instead use the CGPDF* stuff in Quartz. Briefly: you will need to use CGPDFPageGetDictionary() to get the dictionary for the page the image is on, then get the information about its XObject (assuming it's not inlined in the stream) from the dictionary. Even this is not straightforward -- you will need to consult with the PDF standard to understand how the XObject may be formatted and then use the various CG* routines to drill down to what you need.

I should add that the default DPI ("user unit") for a PDF document is 72. Also, many images in PDFs are created with vector graphics so they don't really have a default DPI.

Dimpledimwit answered 24/7, 2012 at 18:40 Comment(2)
For future record, let me point out that this answer is incomplete and confusing. Please also read the PDF specification if you want to understand all of this or simply read Kurt Pfeifle's answer for a method that gets the resolution of the images in the PDF file (and their pixel dimensions if that is really what you are after).Lungan
@David_van_Driessche The original question was about Cocoa -- and the APIs there in 2012, not command line utilities. Furthermore, width and height in pixels are not the same as resolution. pdfimages is making assumptions to get ppi. (and I assume it's pulling its info from the XObject dictionary just as I wrote.) I stand by the above as (still) being accurate for Cocoa.Dimpledimwit
W
2

The answer is definitely no, because PDF documents don't really have intrinsic resolutions. The resolution ultimately depends on who is handling the document and its elements at the time. It can even vary by the amount of zoom you're using in Adobe Acrobat.

For example, I created a 2D barcode with a 16x16 pixel dimensions and scaled it to be an inch wide and an inch tall before adding it to the document. It looks perfectly crisp (ie, many pixels per square element) in adobe acrobat reader, but when I send the resulting PDF out to a faxing service, it ends up being 100x200 resolution (roughly). When I print that same document in a laser printer, it ends up being more like 400dpi. When I click on the barcode image in acrobat reader and copy/paste it into Gimp, it shows up as a tiny 16x16 bitmap.

Wheaten answered 23/1, 2015 at 0:40 Comment(2)
While PDF documents don't really have intrinsic resolutions definitely is true, the OP clarified in his question that he was after the resolution of images contained in PDF documents.Dhobi
I think my answer addresses that as well. You can determine the raw dimensions of the images and their displayed size, but dpi is determined by whatever happens to be rendering/printing the document. It would probably help if we knew OP's purpose. I approached it from the standpoint of recognition of faxed barcodes, but he may be interested in searching documents for high resolution photography or something.Wheaten
P
0

You need the dimensions of the raw image XObject accessed vai the Do command

Pussy answered 25/7, 2012 at 7:23 Comment(0)
D
0

The resolution of each image at the point at which it is used is reported by cpdf -image-resolution <number> where number is the minimum resolution required. So we set a very high resolution, so all images are reported. On Kurt's example PDF:

cpdf -image-resolution 1000000 111_current-transformation-matrix-ctm.pdf 
1, /XOb1, 2, 2, 0.000694, 0.000694
1, /XOb1, 2, 2, 0.000926, 0.000556
1, /XOb1, 2, 2, 0.000545, 0.000958
1, /XOb1, 2, 2, 0.000694, 0.000694
1, /XOb1, 2, 2, 0.000694, 0.000694
1, /XOb1, 2, 2, 0.000491, 0.000694
1, /XOb1, 2, 2, 0.000491, 0.000694
1, /XOb1, 2, 2, 0.000694, 0.000491
1, /XOb1, 2, 2, 0.000139, 0.000098
1, /XOb1, 2, 2, 0.000139, 0.000120
1, /XOb1, 2, 2, 0.000087, 0.000694
1, /XOb1, 2, 2, 0.000694, 0.000087
1, /XOb1, 2, 2, 0.000087, 0.000694
1, /XOb1, 2, 2, 0.000116, 0.000694
1, /XOb1, 2, 2, 0.000087, 0.000694
1, /XOb1, 2, 2, 0.000087, 0.000694
1, /XOb1, 2, 2, 0.000694, 0.000087

The columns are page number, image name, width in pixels, height in pixels, x resolution at point of use, y resolution at point of use.

Dapsang answered 24/9, 2023 at 11:52 Comment(0)
I
-2

This answer is intended as an addendum to @Kurt Pfeifle's answer, and works outside of Objective C.

Alternatively:

If you have a Windows system and do not have a compiler set up, then the following is the easiest method. Download the Windows XPDF binaries; then use pdfimages to extract the images, convert them to a BMP format, and then mspaint will tell you the resolution. The advantages of this method are:

  • You can get an exact resolution without having to estimate it by measuring the image size;

  • It WILL work for XPDF's version of pdfimages.

The disadvantages are:

  • It takes a bit more work, including converting the file to a format you can open without changing the resolution;

  • You have to do this for each file individually, instead of getting a list.

  • It gives you the resolution of the images themselves, not the resolution with which they appeared in the PDF file. (thanks to Kurt Pfeifle's comment)

Infusive answered 12/6, 2015 at 14:57 Comment(6)
Sorry, this answer is as wrong as it can be! Once you extracted the image, you can only see their absolute sizes (width, height) in pixels. What resolution they had when appearing on the PDF depends on the size they get assigned within the contents. You can have the same image appearing multiple times on a page, each time with a different resolution.Burlingame
I dare you: Download this PDF from my GitHub repository of hand-coded PDF files. It's only 30 KiB (mostly comments). It includes a single 2x2 pixel image, used on one page several times and with different resolutions. (The extreme values of this will cause pdfimages to report wrong values...)Burlingame
@KurtPfeifle Hello, Thank you for your comment. I answered this way because since the OP asked about getting the images out of the file, it seemed to me that he was looking for the resolution of the original images, not the resolution with which they were included in the PDF file: this is also what I was looking for when I found this post. You are correct that this procedure does not give you the exact resolution with which they appear in the PDF file.Infusive
There is no such thing as a "resolution of the original images" once extracted from a PDF. There is only the width and the height of the image measured as the number of pixels. There may be an EXIF metadata info telling some "DPI" or "PPI" info. But this is a comment only. It is metadata, not image data! And as such it merely says *"Dear renderer or image processor: I prefer to be rendered at 300 DPI, please! Can you do that for me?". But there's no guarantee a renderer will respect this... (Once an image is embedded on a page, or displayed on a screen, the concept of DPI becomes valid...)Burlingame
I totally agree with Kurt, though I have to say this answer is just as confusing as the original question. The basic measure for images is pixels; an image has a width and height. As such speaking about DPI for images is incorrect. You can only meaningfully speak about DPI when that image is placed in some sort of context (such as on a page).Lungan
I've actually voted down your answer, because I don't like that you call this an addendum and point out a method that is much more convoluted than Kurt's answer (which should have been the accepted answer for this question in the first place) while giving worse results.Lungan
E
-2

You can use an vector graphics editor program like inkscape. Import pdf document, measure dimensions of image. Then get close enough to the picture that you can see the pixels. Draw a square on any pixel. You can calculate the resolution of the image by establishing the proportion.

Extortionary answered 23/9, 2023 at 9:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.