Can a PDF document contain images with different DPI?
Asked Answered
T

3

5

the question says it all. Are there PDF documents that contain images with different dpi (Dot Per Inch) ?

Or is it assumed that if I know the dpi of one image, I know it of the whole document?

Tetrafluoroethylene answered 27/7, 2012 at 23:7 Comment(0)
T
9

I upvoted @ypnos' answer, which is completely correct.

But I'd like to complement it by showing a very recent, new feature of the pdfimages utility.

pdfimages was previously known to be able to extract images from PDF files (and that was its only useful purpose). However, now you can also use it to investigate for more details about the images used, without extracting them.

With the next command I query for the data of all images on pages 7 and 8 of a certain PDF file, using the new -list parameter:

pdfimages -list -f 7 -l 8  ct-magazin-14-2012.pdf

  page   num  type   width height color comp bpc  enc interp  object ID
  ---------------------------------------------------------------------
     7     0 image     581   838  rgb     3   8  jpeg   no        39  0
     7     1 image       4     4  rgb     3   8  image  no        40  0
     7     2 image     314   332  rgb     3   8  jpx    no        44  0
     7     3 image     358   430  rgb     3   8  jpx    no        45  0
     7     4 image       4     4  rgb     3   8  image  no        46  0
     7     5 image       4     4  rgb     3   8  image  no        47  0
     7     6 image       4     6  rgb     3   8  image  no        48  0
     7     7 image     596   462  rgb     3   8  jpx    no        49  0
     7     8 image       4     6  rgb     3   8  image  no        50  0
     7     9 image       4     4  rgb     3   8  image  no        51  0
     7    10 image       8    10  rgb     3   8  image  no        41  0
     7    11 image       6     6  rgb     3   8  image  no        42  0
     7    12 image     113    27  rgb     3   8  jpx    no        43  0
     8    13 image     582   839  gray    1   8  jpeg   no      2080  0
     8    14 image     344   364  gray    1   8  jpx    no      2079  0

Note, however: this version of pdfimages is the one from Poppler (the one from XPDF does not (yet?) support this new feature):

pdfimages -version

  pdfimages version 0.20.2
  Copyright 2005-2012 The Poppler Developers - http://poppler.freedesktop.org
  Copyright 1996-2011 Glyph & Cog, LLC

The -list option appeared for the first time in Poppler v0.19.0, released on March 1st, 2012.

Now, the above list does not directly tell you the resolution ("dpi") of the image. That value is dependent on: at which size is this image rendered on the PDF page?

A PDF can easily have the same image used at different spots of a PDF file, using a different rendering size for each occasion. The image needs to be embedded into the PDF only once but can be used/rendered 'by reference' multiple times (inefficiently constructed PDFs may still contain the same image multiple times, but that's a different topic...)

Now let's clear up the questions which may arise from looking at the respective column headings. What do they mean?

page

  • The page number in the PDF containing the image.

num

  • The image number of the current listing.

type

  • The image type. Possible values are: image (an opaque image), mask (a monochrome image mask), smask (a soft-mask image) and stencil (a monochrome mask image used for painting a color or a pattern). Note: Transparency in PDF for images is created by using two separate PDF objects: one for the image and one for the mask or smask. The mask/smask belonging to a transparent image always directly follows image in the listing.

width

  • The image width in pixels.

height

  • The image height in pixels.

color

  • The image color space. Possible values are gray, rgb, cmyk, lab (L*a*b), icc (ICC based), index (indexed colors), sep (separation) and devn (DeviceN).

comp

  • The number of color components used by the image.

bpc

  • The bits per color component used by the image.

enc

  • The encoding (compression) used by the image. Possible values are: image (a raster image -- may internally use the generic /Flate or /LZW compression, but not a special image encoding), jpeg (JPEG compression), jpx (JPEG2000 compression), jbig2 (JBIG2 compression) and ccitt (Fax compression).

interp

  • Is yes if interpolation was requested when scaling up the image.

object ID

  • The image's PDF object ID (with "generation number") inside the file.

Update (March 2016)

As of Poppler v0.25.0 (released December 11, 2013) and later versions, the command pdfimages -list now includes new columns which indicate the automatically calculated x-ppi (horizontal) and y-ppi (vertical) resolutions for each embedded image as displayed within the PDF page by the PDF renderer.

In addition, the size (in Bytes/kBytes) used by each image (when uncompressed) as well as its size compression ratio (as embedded in PDF) are indicated.

To show the result (using Poppler v0.42.0) for the same file as above:

page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
------------------------------------------------------------------------------------
   7  0 image   581   838  rgb     3   8 jpeg   no       39 0    73    73 2107B 0.1%
   7  1 image     4     4  rgb     3   8 image  no       40 0   150   150   54B 112%
   7  2 image   314   332  rgb     3   8 jpx    no       44 0   150   150 19.0K 6.2%
   7  3 image   358   430  rgb     3   8 jpx    no       45 0   150   150 15.7K 3.5%
   7  4 image     4     4  rgb     3   8 image  no       46 0   150   150   62B 129%
   7  5 image     4     4  rgb     3   8 image  no       47 0   150   150   51B 106%
   7  6 image     4     6  rgb     3   8 image  no       48 0   150   150   62B  86%
   7  7 image   596   462  rgb     3   8 jpx    no       49 0   150   150 40.7K 5.0%
   7  8 image     4     6  rgb     3   8 image  no       50 0   150   150   86B 119%
   7  9 image     4     4  rgb     3   8 image  no       51 0   150   150   62B 129%
   7 10 image     8    10  rgb     3   8 image  no       41 0   150   150  157B  65%
   7 11 image     6     6  rgb     3   8 image  no       42 0   150   150   82B  76%
   7 12 image   113    27  rgb     3   8 jpx    no       43 0   151   152 1090B  12%
   8 13 image   582   839  gray    1   8 jpeg   no     2080 0    72    72  319B 0.1%
   8 14 image   344   364  gray    1   8 jpx    no     2079 0   150   150 4325B 3.5%

x-ppi

  • The horizontal resolution of the image (in pixels per inch) when rendered on the PDF page.

y-ppi

  • The vertical resolution of the image (in pixels per inch) when rendered on the PDF page.

size

  • The size of the embedded image in the PDF file. Following suffixes are in use: 'B' bytes, 'K' kilobytes, 'M' megabytes, and 'G' gigabytes.

ratio

  • The compression ratio of the embedded image.
Tasiatasiana answered 28/7, 2012 at 9:14 Comment(4)
As of pdfimages 0.26.5 in stable, it also prints fields x-ppi and y-ppi.Unschooled
@FaheemMitha: Thanks for fixing the typos in my answer. Also thanks for your comment, updating the info about more recent versions of pdfimages. (I was aware of these, as can be seen my more recent answer from 2015 here...) I'll add an update to my answer to these effects....Tasiatasiana
I wish I could give this answer 1000 upvotes as the detailed description of what each column means is hugely helpful!Arteritis
@CodingSamurai: You could compensate for that by upvoting any other of my almost 1000 answers you think deserves an upvote, one per day over the next 2-3 years :-) Naa, just kidding...Tasiatasiana
C
4

The answer is yes. DPI is independent in each embedded image.

It is only common technique of some DTP programs to re-calculate DPI of all images to an upper bound (if DPI was lower before, it stays). But this is optional. And btw., you can also embed (unaltered) PDF into PDF; at this stage you lost all assumptions you could make.

Chaste answered 27/7, 2012 at 23:20 Comment(0)
P
4

An image is drawn using the Do operator. The operand that is passed to this operator is the name of the image. The image name is looked up in the resources dictionary of the current page. The image resource has a width (number of pixel columns) and a height (number of pixel rows). The physical width and height of the image as it appears on the PDF page is determined by the value of the CTM (current transformation matrix) at the time of the Do operator. If the CTM would equal the identity matrix, the physical width and height would be 1 pt high and 1 pt wide. (1 pt equals 1/72 inch). In general, the CTM has non-identity value that tranforms the 1x1 pt square to a larger image. The combination of the number of pixel rows and pixel columns and the physical extend of the image determines the resolution of the image as it appears on the PDF page.

Example: the image resources consists of 300 pixels rows. Each row consists of 400 pixels. The CTM equals [400 0 0 300 100, 100]. The image height would be 300 pts and the image width would be 400 pts. So the resolution would be 72 dpi in both directions.

In short: The PDF spec allows a PDF to contain images of various resolutions.

Plaided answered 28/7, 2012 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.