How to convert a PDF to PNG with ImageMagick "convert" or Ghostscript?
Asked Answered
O

12

93

I'm trying to convert a PDF to a PNG image (at least the cover of one). I'm successfully extracting the first page of the PDF with pdftk. I'm using imagemagick to do the conversion:

convert cover.pdf cover.png

This works, but unfortunately the cover.png comes through incorrectly rendered (some of the alpha object in the PDF aren't rendered properly). I know ImageMagick uses GhostScript to do the conversion and if I do it directly with gs I can get the desired results, but I'd rather use the convert library as it has other tools I'd like to leverage.

This command in GhostScript accomplishes the desired image:

gs -sDEVICE=pngalpha -sOutputFile=cover.png -r144 cover.pdf

I'm wondering is there any way to pass arguments through convert to GhostScript or am I stuck with calling GhostScript directly?

Osrock answered 17/3, 2009 at 8:18 Comment(5)
Why is calling GhostScript directly a problem?Cazzie
It really isn't that big of a deal. I'd like to run some other params through convert at the same time and it'd be nice if I could keep it all in one command. Keeps my code cleaner and more consistent. It also means one less temporary file.Osrock
See also PDFBox: Problem with converting pdf page into image and Use Apache PDFBox convert PDF to image.Balbriggan
What's the difference between how you call gs and how ImageMagick calls it? Might be worth reporting something upstream to ImageMagick (note to followers, updating ghostscript can help as well...)Bait
I had the best luck with pdftoppm: askubuntu.com/a/50180/951756Shortlived
T
75

You can use one commandline with two commands (gs, convert) connected through a pipe, if the first command can write its output to stdout, and if the second one can read its input from stdin.

  1. Luckily, gs can write to stdout (... -o %stdout ...).
  2. Luckily, convert can read from stdin (convert -background transparent - output.png).

Problem solved:

  • GS used for alpha channel handling a special image,
  • convert used for creating transparent background,
  • pipe used to avoid writing out a temp file on disk.

Complete solution:

gs -sDEVICE=pngalpha       \
   -o %stdout              \
   -r144 cover.pdf         \
   |                       \
convert                    \
   -background transparent \
   -                       \
    cover.png

Update

If you want to have a separate PNG per PDF page, you can use the %d syntax:

gs -sDEVICE=pngalpha -o file-%03d.png -r144 cover.pdf

This will create PNG files named page-000.png, page-001.png, ... (Note that the %d-counting is zero-based -- file-000.png corresponds to page 1 of the PDF, 001 to page 2...

Or, if you want to keep your transparent background, for a 100-page PDF, do

for i in {1..100}; do        \
                             \
  gs -sDEVICE=pngalpha       \
     -dFirstPage="${i}"      \
     -dLastPage="${i}"       \
     -o %stdout              \
     -r144 input.pdf         \
     |                       \
  convert                    \
     -background transparent \
     -                       \
      page-${i}.png ;        \
                             \
done
Timmi answered 31/7, 2010 at 20:14 Comment(9)
This only works for me if I add -dBATCH -dNOPAUSE -dQUIET to the gs options.Perfervid
@ford: That means you have an old version of Ghostscript. Recent versions can do -o output.file and this automatically and silently also sets -dBATCH -dNOPAUSE -dQUIET at the same time.Timmi
@ford: However, I had a serious typo elsewhere in the above answer. I wonder why it got 22 upvotes despite of that :-)Timmi
Work find for me but I'd like automaticaly conver a multipage pdf to image_1.png, image_2.png ... Is that easy in one command should I extract each page from the pdf file first ?Variole
Ok I have separated images. But I want "-transparence white" as 'convert' parameter during the convertion. I was able to do it with the pipe, but without ?Variole
Yes thanks. To avoid the number of page issue, I prefer using your first update and the this : "for i in ls file-*.png | sort; do convert $i -transparent white $i done" in the same script.Variole
If you don't control the input documents, by god, set -dSAFER.Cowcatcher
@mlissner: I appreciate your comment. Admittedly, I may be a bit neglecting when it comes to using -dSAFER routinely on my Ghostscript command lines. I should take your advice to heart. Thank you.Timmi
You may also want to add to gs command these flags: -dTextAlphaBits=4 -dGraphicsAlphaBits=4 , for anti-aliasing of text-fonts and drawn graphics.Lockyer
H
34

Out of all the available alternatives I found Inkscape to produce the most accurate results when converting PDFs to PNG. Especially when the source file had transparent layers, Inkscape succeeded where Imagemagick and other tools failed.

This is the command I use:

inkscape "$pdf" -z --export-dpi=600 --export-area-drawing --export-png="$pngfile"

And here it is implemented in a script:

#!/bin/bash

while [ $# -gt 0 ]; do

pdf=$1
echo "Converting "$pdf" ..."
pngfile=`echo "$pdf" | sed 's/\.\w*$/.png/'`
inkscape "$pdf" -z --export-dpi=600 --export-area-drawing --export-png="$pngfile"
echo "Converted to "$pngfile""
shift

done

echo "All jobs done. Exiting."
Heart answered 18/3, 2013 at 18:59 Comment(1)
Note that --export-png is now deprecated. Simply use --export-filename="$pngfile" if export type is to be inferred from the filename, or --export-filename="$pngfile" --export-type="png" to be explicitRhizocarpous
J
26

To convert pdf to image files use following commands:

For PNG gs -sDEVICE=png16m -dTextAlphaBits=4 -r300 -o a.png a.pdf

For JPG gs -sDEVICE=jpeg -dTextAlphaBits=4 -r300 -o a.jpg a.pdf

If you have multiple pages add to name %03d gs -o a%03d.jpg a.pdf

What each option means:

  • sDEVICE={jpeg,pngalpha,png16m...} - filetype
  • -o - output file (%stdout to stdout)
  • -dTextAlphaBits=4 - font antialiasing.
  • -r300 - 300 dpi
Janeejaneen answered 4/11, 2015 at 17:52 Comment(1)
Useful answer, but not to this question...Shugart
H
13

One can also use the command line utilities included in poppler-utils package:

sudo apt-get install poppler-utils
pdftoppm --help
pdftocairo --help

Example:

pdftocairo -png mypage.pdf mypage.png
Homer answered 23/9, 2017 at 13:24 Comment(2)
It's very good. If the PDF is multi-page there will be multiple PNG files.Cavalryman
For macos use brew install popplerAllocution
C
9

Couldn't get the accepted answer to work. Then found out that actually the solution is much simpler anyway as Ghostscript not just natively supports PNG but even multiple different "encodings":

  • png256
  • png16
  • pnggray
  • pngmono
  • ...

The shell command that works for me is:

gs -dNOPAUSE -q -sDEVICE=pnggray -r500 -dBATCH -dFirstPage=2 -dLastPage=2 -sOutputFile=test.png test.pdf

It will save page 2 of test.pdf to test.png using the pnggray encoding and 500 DPI.

Clarino answered 17/3, 2015 at 19:50 Comment(1)
This works quite well. As a small addition, I would like to add that appending a "%d" to the output creates a new file per page. This makes the command look like this: gs -dNOPAUSE -q -sDEVICE=pnggray -r500 -dBATCH -dFirstPage=2 -dLastPage=5 -sOutputFile=output%d.png input.pdfDogeared
S
3

As this page also lists alternative tools I'll mention xpdf which has command line tools ready compiled for Linux/Windows/Mac. Supports transparency. Is free for commercial use - opposed to Ghostscript which has truly outrageous pricing.

In a test on a huge PDF file it was 7.5% faster than Ghostscript.

(It also has PDF to text and HTML converters)

Symphysis answered 28/6, 2019 at 16:9 Comment(1)
I have now used this for a little while and it works just fine. In general it is a bit slower than Ghostscript though at higher resolutions. But images looks much nicer (though a bit darker) and anti-aliasing which I could get to work in Ghostscript works great in xpdf!Symphysis
H
1

I'll add my solution, even thought his thread is old. Maybe this will help someone anyway.

First, I need to generate the PDF. I use XeLaTeX for that:

xelatex test.tex

Now, ImageMagick and GraphicMagic both parse parameters from left to right, so the leftmost parameter, will be executed first. I ended up using this sequence for optimal processing:

gm convert -trim -transparent white -background transparent -density 1200x1200 -resize 25% test.pdf test.png

It gives nice graphics on transparent background, trimmed to what is actually on the page. The -density and -resize parameters, give a better granularity, and increase overall resolution.

I suggest checking if the density can be decreased for you. It'll cut down converting time.

Homestead answered 11/7, 2012 at 12:4 Comment(0)
M
1

For a PDF that ImageMagick was giving inaccurate colors I found that GraphicsMagick did a better job:

$ gm convert -quality 100 -thumbnail x300 -flatten journal.pdf\[0\] cover.jpg
Mammillate answered 2/9, 2015 at 7:0 Comment(1)
Not enough info to be sure, but this could be because the colourspaces were not defined correctly. Check out the -colorspace IM option.Eckel
L
1

Try to extract a single page.

$page = 4

gs -sDEVICE=pngalpha -dFirstPage="$page" -dLastPage="$page" -o thumb.png -r144 input.pdf
Lowerclassman answered 5/8, 2019 at 1:37 Comment(0)
M
0

My solution is much simpler and more direct. At least it works that way on my PC (with the following specs):

me@home: my.folder$ uname -a
Linux home 3.2.0-54-generic-pae #82-Ubuntu SMP Tue Sep 10 20:29:22 UTC 2013 i686 i686 i386 GNU/Linux

with

me@home: my.folder$ convert --version
Version: ImageMagick 6.6.9-7 2012-08-17 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2011 ImageMagick Studio LLC
Features: OpenMP

So, here's what I run on my file.pdf:

me@home: my.folder$ convert -density 300 -quality 100 file.pdf file.png
Meredithmeredithe answered 16/11, 2013 at 13:41 Comment(1)
Yeah this is what the OP tried initially but couldn't get something err other to work underneath when ImageMagick calls through to ghostscript...but if it works go for it :)Bait
B
0

You can use ImageMagick without separating the first page of the PDF with other tools. Just do

convert -density 288 cover.pdf[0] -resize 25% cover.png


Here I increase the nominal density by 400% (72*4=288) and then resize by 1/4 (25%). This gives a much better quality for the resulting png.

However, if the PDF is CMYK, PNG does not support that. It would need to be converted to sRGB, especially if it has transparency, since Ghostscript cannot handle CMYK with alpha.

convert -density 288 -colorspace sRGB -resize 25% cover.pdf[0] cover.png
Bosley answered 28/6, 2019 at 16:16 Comment(0)
A
0

I studied the answers presented and then compared them to my scanner which can scan to PDF or can scan to image (typically JPEG). I found the following:

  • Typically scanners work at 300 DPI, sometimes higher at 600 DPI or lower at 150 DPI
  • The scanned images are typically full 24 bit RGB color (8 bits per band) and are typically compressed as JPEG
  • PNGs, however, are typically used for digital drawings and are typically 8-bit with a palette
  • PDFs are typically multi-page, whereas PNG and JPEG are typically a single page

When converting from PDF to PNG we have to ask ourselves whether we want to use the default 300 DPI or whether we want to resample the image at a higher resolution 600 DPI or lower resolution 150 DPI. We also have to ask ourselves if the PDF contains photos and therefore we need 24-bit images, i.e. png16m. Alternatively, if the PDF just contains digital documents, predominantly black and white text but may contain a limited number of colors, then, an 8-bit image format is sufficient, i.e. png256. We also need to ask ourselves whether we want to have access to multiple pages or are content with one page.

For the rest of the answer, I will assume 300 DPI and digital documents that do not contain photos, i.e. 8-bit format (or 256-color format).

For a single page extraction, I determine the parameters need to be PDF, PNG, and PAGENO:

#!/bin/bash
# Usage: pdf2png.sh input.pdf output.png pageno
PDF=$1
PNG=$2
PAGENO=$3
FORMAT=png256 # png16m png16 pngmono
DPI=300 # 600 150
cmd=(gs)
cmd+=(-dNOPAUSE)
cmd+=(-q)
cmd+=(-sDEVICE=${FORMAT})
cmd+=(-r${DPI})
cmd+=(-dBATCH)
cmd+=(-dFirstPage=${PAGENO})
cmd+=(-dLastPage=${PAGENO})
cmd+=(-sOutputFile=${PNG})
cmd+=(${PDF})
${cmd[@]}

For a multiple-page extraction, I determine the parameters need to be PDF and DIR:

#!/bin/bash
# Usage: pdf2pngs.sh input.pdf outputdir
PDF=$1
DIR=$2
FORMAT=png256 # png16m png16 pngmono
DPI=300 # 600 150
mkdir -p ${DIR}
cmd=(gs)
cmd+=(-dNOPAUSE)
cmd+=(-q)
cmd+=(-sDEVICE=${FORMAT})
cmd+=(-r${DPI})
cmd+=(-dBATCH)
cmd+=(-sOutputFile=${DIR}/'p%d.png')
cmd+=(${PDF})
${cmd[@]}

To join the pages back together to PNG, we can make use of ImageMagick convert as follows:

#!/bin/bash
# pngs2pdf.sh dir output.pdf
DIR=$1
PDF=$2
convert ${DIR}/*.png ${PDF}
Ariannaarianne answered 27/10, 2022 at 5:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.