I studied the answers presented and then compared them to my scanner which can scan to PDF or can scan to image (typically JPEG). I found the following:
- Typically scanners work at 300 DPI, sometimes higher at 600 DPI or lower at 150 DPI
- The scanned images are typically full 24 bit RGB color (8 bits per band) and are typically compressed as JPEG
- PNGs, however, are typically used for digital drawings and are typically 8-bit with a palette
- PDFs are typically multi-page, whereas PNG and JPEG are typically a single page
When converting from PDF to PNG we have to ask ourselves whether we want to use the default 300 DPI or whether we want to resample the image at a higher resolution 600 DPI or lower resolution 150 DPI. We also have to ask ourselves if the PDF contains photos and therefore we need 24-bit images, i.e. png16m
. Alternatively, if the PDF just contains digital documents, predominantly black and white text but may contain a limited number of colors, then, an 8-bit image format is sufficient, i.e. png256
. We also need to ask ourselves whether we want to have access to multiple pages or are content with one page.
For the rest of the answer, I will assume 300 DPI and digital documents that do not contain photos, i.e. 8-bit format (or 256-color format).
For a single page extraction, I determine the parameters need to be PDF
, PNG
, and PAGENO
:
#!/bin/bash
# Usage: pdf2png.sh input.pdf output.png pageno
PDF=$1
PNG=$2
PAGENO=$3
FORMAT=png256 # png16m png16 pngmono
DPI=300 # 600 150
cmd=(gs)
cmd+=(-dNOPAUSE)
cmd+=(-q)
cmd+=(-sDEVICE=${FORMAT})
cmd+=(-r${DPI})
cmd+=(-dBATCH)
cmd+=(-dFirstPage=${PAGENO})
cmd+=(-dLastPage=${PAGENO})
cmd+=(-sOutputFile=${PNG})
cmd+=(${PDF})
${cmd[@]}
For a multiple-page extraction, I determine the parameters need to be PDF
and DIR
:
#!/bin/bash
# Usage: pdf2pngs.sh input.pdf outputdir
PDF=$1
DIR=$2
FORMAT=png256 # png16m png16 pngmono
DPI=300 # 600 150
mkdir -p ${DIR}
cmd=(gs)
cmd+=(-dNOPAUSE)
cmd+=(-q)
cmd+=(-sDEVICE=${FORMAT})
cmd+=(-r${DPI})
cmd+=(-dBATCH)
cmd+=(-sOutputFile=${DIR}/'p%d.png')
cmd+=(${PDF})
${cmd[@]}
To join the pages back together to PNG, we can make use of ImageMagick convert
as follows:
#!/bin/bash
# pngs2pdf.sh dir output.pdf
DIR=$1
PDF=$2
convert ${DIR}/*.png ${PDF}