the question says it all. Are there PDF documents that contain images with different dpi (Dot Per Inch) ?
Or is it assumed that if I know the dpi of one image, I know it of the whole document?
the question says it all. Are there PDF documents that contain images with different dpi (Dot Per Inch) ?
Or is it assumed that if I know the dpi of one image, I know it of the whole document?
I upvoted @ypnos' answer, which is completely correct.
But I'd like to complement it by showing a very recent, new feature of the pdfimages
utility.
pdfimages
was previously known to be able to extract images from PDF files (and that was its only useful purpose). However, now you can also use it to investigate for more details about the images used, without extracting them.
With the next command I query for the data of all images on pages 7 and 8 of a certain PDF file, using the new -list
parameter:
pdfimages -list -f 7 -l 8 ct-magazin-14-2012.pdf page num type width height color comp bpc enc interp object ID --------------------------------------------------------------------- 7 0 image 581 838 rgb 3 8 jpeg no 39 0 7 1 image 4 4 rgb 3 8 image no 40 0 7 2 image 314 332 rgb 3 8 jpx no 44 0 7 3 image 358 430 rgb 3 8 jpx no 45 0 7 4 image 4 4 rgb 3 8 image no 46 0 7 5 image 4 4 rgb 3 8 image no 47 0 7 6 image 4 6 rgb 3 8 image no 48 0 7 7 image 596 462 rgb 3 8 jpx no 49 0 7 8 image 4 6 rgb 3 8 image no 50 0 7 9 image 4 4 rgb 3 8 image no 51 0 7 10 image 8 10 rgb 3 8 image no 41 0 7 11 image 6 6 rgb 3 8 image no 42 0 7 12 image 113 27 rgb 3 8 jpx no 43 0 8 13 image 582 839 gray 1 8 jpeg no 2080 0 8 14 image 344 364 gray 1 8 jpx no 2079 0
Note, however: this version of pdfimages
is the one from Poppler (the one from XPDF does not (yet?) support this new feature):
pdfimages -version pdfimages version 0.20.2 Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC
The -list
option appeared for the first time in Poppler v0.19.0, released on March 1st, 2012.
Now, the above list does not directly tell you the resolution ("dpi") of the image. That value is dependent on: at which size is this image rendered on the PDF page?
A PDF can easily have the same image used at different spots of a PDF file, using a different rendering size for each occasion. The image needs to be embedded into the PDF only once but can be used/rendered 'by reference' multiple times (inefficiently constructed PDFs may still contain the same image multiple times, but that's a different topic...)
Now let's clear up the questions which may arise from looking at the respective column headings. What do they mean?
page
num
type
image
(an opaque image), mask
(a monochrome image mask), smask
(a soft-mask image) and stencil
(a monochrome mask image used for painting a color or a pattern). Note: Transparency in PDF for images is created by using two separate PDF objects: one for the image and one for the mask or smask. The mask/smask belonging to a transparent image always directly follows image in the listing.width
height
color
gray
, rgb
, cmyk
, lab
(L*a*b), icc
(ICC based), index
(indexed colors), sep
(separation) and devn
(DeviceN).comp
bpc
enc
image
(a raster image -- may internally use the generic /Flate
or /LZW
compression, but not a special image encoding), jpeg
(JPEG compression), jpx
(JPEG2000 compression), jbig2
(JBIG2 compression) and ccitt
(Fax compression).interp
yes
if interpolation was requested when scaling up the image.object ID
As of Poppler v0.25.0 (released December 11, 2013) and later versions, the command pdfimages -list
now includes new columns which indicate the automatically calculated x-ppi
(horizontal) and y-ppi
(vertical) resolutions for each embedded image as displayed within the PDF page by the PDF renderer.
In addition, the size (in Bytes/kBytes) used by each image (when uncompressed) as well as its size compression ratio (as embedded in PDF) are indicated.
To show the result (using Poppler v0.42.0) for the same file as above:
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
7 0 image 581 838 rgb 3 8 jpeg no 39 0 73 73 2107B 0.1%
7 1 image 4 4 rgb 3 8 image no 40 0 150 150 54B 112%
7 2 image 314 332 rgb 3 8 jpx no 44 0 150 150 19.0K 6.2%
7 3 image 358 430 rgb 3 8 jpx no 45 0 150 150 15.7K 3.5%
7 4 image 4 4 rgb 3 8 image no 46 0 150 150 62B 129%
7 5 image 4 4 rgb 3 8 image no 47 0 150 150 51B 106%
7 6 image 4 6 rgb 3 8 image no 48 0 150 150 62B 86%
7 7 image 596 462 rgb 3 8 jpx no 49 0 150 150 40.7K 5.0%
7 8 image 4 6 rgb 3 8 image no 50 0 150 150 86B 119%
7 9 image 4 4 rgb 3 8 image no 51 0 150 150 62B 129%
7 10 image 8 10 rgb 3 8 image no 41 0 150 150 157B 65%
7 11 image 6 6 rgb 3 8 image no 42 0 150 150 82B 76%
7 12 image 113 27 rgb 3 8 jpx no 43 0 151 152 1090B 12%
8 13 image 582 839 gray 1 8 jpeg no 2080 0 72 72 319B 0.1%
8 14 image 344 364 gray 1 8 jpx no 2079 0 150 150 4325B 3.5%
x-ppi
y-ppi
size
ratio
pdfimages
. (I was aware of these, as can be seen my more recent answer from 2015 here...) I'll add an update to my answer to these effects.... –
Tasiatasiana The answer is yes. DPI is independent in each embedded image.
It is only common technique of some DTP programs to re-calculate DPI of all images to an upper bound (if DPI was lower before, it stays). But this is optional. And btw., you can also embed (unaltered) PDF into PDF; at this stage you lost all assumptions you could make.
An image is drawn using the Do operator. The operand that is passed to this operator is the name of the image. The image name is looked up in the resources dictionary of the current page. The image resource has a width (number of pixel columns) and a height (number of pixel rows). The physical width and height of the image as it appears on the PDF page is determined by the value of the CTM (current transformation matrix) at the time of the Do operator. If the CTM would equal the identity matrix, the physical width and height would be 1 pt high and 1 pt wide. (1 pt equals 1/72 inch). In general, the CTM has non-identity value that tranforms the 1x1 pt square to a larger image. The combination of the number of pixel rows and pixel columns and the physical extend of the image determines the resolution of the image as it appears on the PDF page.
Example: the image resources consists of 300 pixels rows. Each row consists of 400 pixels. The CTM equals [400 0 0 300 100, 100]. The image height would be 300 pts and the image width would be 400 pts. So the resolution would be 72 dpi in both directions.
In short: The PDF spec allows a PDF to contain images of various resolutions.
© 2022 - 2024 — McMap. All rights reserved.
pdfimages
0.26.5 in stable, it also prints fieldsx-ppi
andy-ppi
. – Unschooled