Using GhostScript to get page size
Asked Answered
B

3

17

Is it possible to get the page size (from e.g. a PDF document page) using GhostScript? I have seen the "bbox" device, but it returns the bounding box (it differs per page), not the TrimBox (or CropBox) of the PDF pages. (See http://www.prepressure.com/pdf/basics/page_boxes for info about page boxes.) Any other possibility?

Bulla answered 31/5, 2010 at 11:47 Comment(0)
R
10

Meanwhile I found a different method. This one uses Ghostscript only (just as you required). No need for additional third party utilities.

This method uses a little helper program, written in PostScript, shipping with the source code of Ghostscript. Look in the toolbin subdir for the pdf_info.ps file.

The included comments say you should run it like this in order to list fonts used, media sizes used

gswin32c -dNODISPLAY ^
   -q ^
   -sFile=____.pdf ^
   [-dDumpMediaSizes] ^
   [-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
   toolbin/pdf_info.ps

I did run it on a local example file, with commandline parameters that ask for the media sizes only (not the fonts used). Here is the result:

C:\> gswin32c ^
      -dNODISPLAY ^
      -q ^
      -sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
      -dDumpMediaSizes ^
      C:/gs8.71/lib/pdf_info.ps


  c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
  Creator: FrameMaker 6.0
  Producer: Acrobat Distiller 5.0.5 (Windows)
  CreationDate: D:20060817164306Z
  ModDate: D:20060822122024+02'00'

  Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
  Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  [....]
Reagan answered 28/6, 2010 at 13:16 Comment(8)
Does ghostscript still ship with pdf_info.ps? If not, where would be a good place to get a copy?Stop
You can look for it in Ghostscript's Git repository: http://git.ghostscript.com/?p=ghostpdl.git;a=summary. Or try this direct link.Reagan
Thanks! I'd found a copy somewhere, but I don't think it was as up to date.Stop
Can no longer find it in the git repo - at least not via google. And, on ubuntu the /usr/share/ghostscript/9.18/lib directory does not contain it. Is there are alternative? (Alternative location or program?)Deicer
@Deicer It looks like the file is still available in the git repository. If you go to "tree" view on the repo, then navigate to the "toolbin" folder you will find it in there.Baun
Ah yes, there it is. Thank you.Deicer
The file pdf_info.ps has apparently be moved into the [ghostpdl.git]/lib subfolderGrisby
Note you need option -dNOSAFER to open a file from command line tooGrisby
R
13

Unfortunately it doesn't seem quite easy to get the (possibly different) page sizes (or *Boxes for that matter) inside a PDF with the help of Ghostscript.

But since you asked for other possibilities as well: a rather reliable way to determine the media sizes for each page (and even each one of the embedded {Trim,Media,Crop,Bleed}Boxes) is the commandline tool pdfinfo.exe. This utility is part of the XPDF tools from http://www.foolabs.com/xpdf/download.html . You can run the tool with the "-box" parameter and tell it with "-f 3" to start at page 3 and with "-l 8" to stop processing at page 8.

Example output:

C:\downloads>pdfinfo -box -f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf
Creator:        FrameMaker 6.0
Producer:       Acrobat Distiller 5.0.5 (Windows)
CreationDate:   08/17/06 16:43:06
ModDate:        08/22/06 12:20:24
Tagged:         no
Pages:          146
Encrypted:      no
Page    1 size: 419.535 x 297.644 pts
Page    2 size: 297.646 x 419.524 pts
Page    3 size: 297.646 x 419.524 pts
Page    1 MediaBox:     0.00     0.00   595.00   842.00
Page    1 CropBox:     87.25   430.36   506.79   728.00
Page    1 BleedBox:    87.25   430.36   506.79   728.00
Page    1 TrimBox:     87.25   430.36   506.79   728.00
Page    1 ArtBox:      87.25   430.36   506.79   728.00
Page    2 MediaBox:     0.00     0.00   595.00   842.00
Page    2 CropBox:    148.17   210.76   445.81   630.28
Page    2 BleedBox:   148.17   210.76   445.81   630.28
Page    2 TrimBox:    148.17   210.76   445.81   630.28
Page    2 ArtBox:     148.17   210.76   445.81   630.28
Page    3 MediaBox:     0.00     0.00   595.00   842.00
Page    3 CropBox:    148.17   210.76   445.81   630.28
Page    3 BleedBox:   148.17   210.76   445.81   630.28
Page    3 TrimBox:    148.17   210.76   445.81   630.28
Page    3 ArtBox:     148.17   210.76   445.81   630.28
File size:      6888764 bytes
Optimized:      yes
PDF version:    1.4
Reagan answered 5/6, 2010 at 19:2 Comment(1)
To get all the pages, don't specify a first (so no -f) and set last page to -1 (so -l -1)Enterprising
V
12

A solution in pure GhostScript PostScript, no additional scripts necessary:

gs -dQUIET -sFileName=path/to/file.pdf -c "FileName (r) file runpdfbegin 1 1 pdfpagecount {pdfgetpage /MediaBox get {=print ( ) print} forall (\n) print} for quit"

The command prints the MediaBox of each page in the PDF as four numbers per line. An example from a 3-page PDF:

0 0 595 841
0 0 595 841
0 0 595 841

Here's a breakdown of the command:

FileName (r) file  % open file given by -sFileName
runpdfbegin        % open file as pdf
1 1 pdfpagecount { % for each page index
  pdfgetpage       % get pdf page properties (pushes a dict)
  /MediaBox get    % get MediaBox value from dict (pushes an array of numbers)
  {                % for every array element
    =print         % print element value
    ( ) print      % print single space
  } forall
  (\n) print       % print new line
} for
quit               % quit interpreter. Not necessary if you pass -dBATCH to gs

Replace /MediaBox with /CropBox to get the crop box.

Vocoid answered 4/10, 2018 at 10:8 Comment(6)
Nice answer! And Nice breakthrough! I was getting an error here though (cannot open X display :0.0), which can be fixed by opening an X server, or by adding -dNODISPLAY to the call (better, since we don't need X anyways).Mcmillan
This command fails if your pages are rotated. It outputs width for height and vice versa.Ettieettinger
It is giving me this error. ``` Error: /invalidfileaccess in --file-- Operand stack: (Chapter10.pdf) (r) Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- Dictionary stack: --dict:729/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- Current allocation mode is local Last OS error: Permission denied ```Batty
What unit of measurements are the numbers?Grisby
@AtifAli you need option -dNOSAFERGrisby
Any way to modify the command so it fetches both MediaBox, TrimBox and BleedBox in one go from the first page only?Mcclendon
R
10

Meanwhile I found a different method. This one uses Ghostscript only (just as you required). No need for additional third party utilities.

This method uses a little helper program, written in PostScript, shipping with the source code of Ghostscript. Look in the toolbin subdir for the pdf_info.ps file.

The included comments say you should run it like this in order to list fonts used, media sizes used

gswin32c -dNODISPLAY ^
   -q ^
   -sFile=____.pdf ^
   [-dDumpMediaSizes] ^
   [-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
   toolbin/pdf_info.ps

I did run it on a local example file, with commandline parameters that ask for the media sizes only (not the fonts used). Here is the result:

C:\> gswin32c ^
      -dNODISPLAY ^
      -q ^
      -sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
      -dDumpMediaSizes ^
      C:/gs8.71/lib/pdf_info.ps


  c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
  Creator: FrameMaker 6.0
  Producer: Acrobat Distiller 5.0.5 (Windows)
  CreationDate: D:20060817164306Z
  ModDate: D:20060822122024+02'00'

  Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
  Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  [....]
Reagan answered 28/6, 2010 at 13:16 Comment(8)
Does ghostscript still ship with pdf_info.ps? If not, where would be a good place to get a copy?Stop
You can look for it in Ghostscript's Git repository: http://git.ghostscript.com/?p=ghostpdl.git;a=summary. Or try this direct link.Reagan
Thanks! I'd found a copy somewhere, but I don't think it was as up to date.Stop
Can no longer find it in the git repo - at least not via google. And, on ubuntu the /usr/share/ghostscript/9.18/lib directory does not contain it. Is there are alternative? (Alternative location or program?)Deicer
@Deicer It looks like the file is still available in the git repository. If you go to "tree" view on the repo, then navigate to the "toolbin" folder you will find it in there.Baun
Ah yes, there it is. Thank you.Deicer
The file pdf_info.ps has apparently be moved into the [ghostpdl.git]/lib subfolderGrisby
Note you need option -dNOSAFER to open a file from command line tooGrisby

© 2022 - 2024 — McMap. All rights reserved.