Is it possible to get the page size (from e.g. a PDF document page) using GhostScript? I have seen the "bbox" device, but it returns the bounding box (it differs per page), not the TrimBox (or CropBox) of the PDF pages. (See http://www.prepressure.com/pdf/basics/page_boxes for info about page boxes.) Any other possibility?
Meanwhile I found a different method. This one uses Ghostscript only (just as you required). No need for additional third party utilities.
This method uses a little helper program, written in PostScript, shipping with the source code of Ghostscript. Look in the toolbin subdir for the pdf_info.ps
file.
The included comments say you should run it like this in order to list fonts used, media sizes used
gswin32c -dNODISPLAY ^
-q ^
-sFile=____.pdf ^
[-dDumpMediaSizes] ^
[-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
toolbin/pdf_info.ps
I did run it on a local example file, with commandline parameters that ask for the media sizes only (not the fonts used). Here is the result:
C:\> gswin32c ^
-dNODISPLAY ^
-q ^
-sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
-dDumpMediaSizes ^
C:/gs8.71/lib/pdf_info.ps
c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
Creator: FrameMaker 6.0
Producer: Acrobat Distiller 5.0.5 (Windows)
CreationDate: D:20060817164306Z
ModDate: D:20060822122024+02'00'
Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
[....]
pdf_info.ps
has apparently be moved into the [ghostpdl.git]/lib
subfolder –
Grisby -dNOSAFER
to open a file from command line too –
Grisby Unfortunately it doesn't seem quite easy to get the (possibly different) page sizes (or *Boxes for that matter) inside a PDF with the help of Ghostscript.
But since you asked for other possibilities as well: a rather reliable way to determine the media sizes for each page (and even each one of the embedded {Trim,Media,Crop,Bleed}Boxes) is the commandline tool pdfinfo.exe. This utility is part of the XPDF tools from http://www.foolabs.com/xpdf/download.html . You can run the tool with the "-box" parameter and tell it with "-f 3" to start at page 3 and with "-l 8" to stop processing at page 8.
Example output:
C:\downloads>pdfinfo -box -f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf Creator: FrameMaker 6.0 Producer: Acrobat Distiller 5.0.5 (Windows) CreationDate: 08/17/06 16:43:06 ModDate: 08/22/06 12:20:24 Tagged: no Pages: 146 Encrypted: no Page 1 size: 419.535 x 297.644 pts Page 2 size: 297.646 x 419.524 pts Page 3 size: 297.646 x 419.524 pts Page 1 MediaBox: 0.00 0.00 595.00 842.00 Page 1 CropBox: 87.25 430.36 506.79 728.00 Page 1 BleedBox: 87.25 430.36 506.79 728.00 Page 1 TrimBox: 87.25 430.36 506.79 728.00 Page 1 ArtBox: 87.25 430.36 506.79 728.00 Page 2 MediaBox: 0.00 0.00 595.00 842.00 Page 2 CropBox: 148.17 210.76 445.81 630.28 Page 2 BleedBox: 148.17 210.76 445.81 630.28 Page 2 TrimBox: 148.17 210.76 445.81 630.28 Page 2 ArtBox: 148.17 210.76 445.81 630.28 Page 3 MediaBox: 0.00 0.00 595.00 842.00 Page 3 CropBox: 148.17 210.76 445.81 630.28 Page 3 BleedBox: 148.17 210.76 445.81 630.28 Page 3 TrimBox: 148.17 210.76 445.81 630.28 Page 3 ArtBox: 148.17 210.76 445.81 630.28 File size: 6888764 bytes Optimized: yes PDF version: 1.4
-f
) and set last page to -1 (so -l -1
) –
Enterprising A solution in pure GhostScript PostScript, no additional scripts necessary:
gs -dQUIET -sFileName=path/to/file.pdf -c "FileName (r) file runpdfbegin 1 1 pdfpagecount {pdfgetpage /MediaBox get {=print ( ) print} forall (\n) print} for quit"
The command prints the MediaBox of each page in the PDF as four numbers per line. An example from a 3-page PDF:
0 0 595 841
0 0 595 841
0 0 595 841
Here's a breakdown of the command:
FileName (r) file % open file given by -sFileName
runpdfbegin % open file as pdf
1 1 pdfpagecount { % for each page index
pdfgetpage % get pdf page properties (pushes a dict)
/MediaBox get % get MediaBox value from dict (pushes an array of numbers)
{ % for every array element
=print % print element value
( ) print % print single space
} forall
(\n) print % print new line
} for
quit % quit interpreter. Not necessary if you pass -dBATCH to gs
Replace /MediaBox
with /CropBox
to get the crop box.
:0.0
), which can be fixed by opening an X server, or by adding -dNODISPLAY
to the call (better, since we don't need X anyways). –
Mcmillan -dNOSAFER
–
Grisby Meanwhile I found a different method. This one uses Ghostscript only (just as you required). No need for additional third party utilities.
This method uses a little helper program, written in PostScript, shipping with the source code of Ghostscript. Look in the toolbin subdir for the pdf_info.ps
file.
The included comments say you should run it like this in order to list fonts used, media sizes used
gswin32c -dNODISPLAY ^
-q ^
-sFile=____.pdf ^
[-dDumpMediaSizes] ^
[-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
toolbin/pdf_info.ps
I did run it on a local example file, with commandline parameters that ask for the media sizes only (not the fonts used). Here is the result:
C:\> gswin32c ^
-dNODISPLAY ^
-q ^
-sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
-dDumpMediaSizes ^
C:/gs8.71/lib/pdf_info.ps
c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
Creator: FrameMaker 6.0
Producer: Acrobat Distiller 5.0.5 (Windows)
CreationDate: D:20060817164306Z
ModDate: D:20060822122024+02'00'
Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
[....]
pdf_info.ps
? If not, where would be a good place to get a copy? –
Stop pdf_info.ps
has apparently be moved into the [ghostpdl.git]/lib
subfolder –
Grisby -dNOSAFER
to open a file from command line too –
Grisby © 2022 - 2024 — McMap. All rights reserved.
pdf_info.ps
? If not, where would be a good place to get a copy? – Stop