I have around 1000 pdf filesand I need to convert them to 300 dpi tiff files. What is the best way to do this? If there is an SDK or something or a tool that can be scripted that would be ideal.
Use Imagemagick, or better yet, Ghostscript.
http://www.ibm.com/developerworks/library/l-graf2/#N101C2 has an example for imagemagick:
convert foo.pdf pages-%03d.tiff
http://www.asmail.be/msg0055376363.html has an example for ghostscript:
gs -q -dNOPAUSE -sDEVICE=tiffg4 -sOutputFile=a.tif foo.pdf -c quit
I would install ghostscript and read the man page for gs to see what exact options are needed and experiment.
imagemagick
worked well without configuration. I could not configure ghostscript
properly to get a high resolution colour image. –
Reinwald convert foo.pdf pages-%03d.tiff
produces horribly-low-quality images. How do we increase the resolution to be what is already in the pdf, so no resolution is lost? –
Outrun pdftoppm
: askubuntu.com/questions/150100/…. Also, in case the goal if making TIFFs here is to use tesseract
to convert the PDF to a searchable pdf via OCR, I've done that now too and written an interface to do it in one step: pdf2searchablepdf input.pdf
--see here: askubuntu.com/questions/473843/…. –
Outrun Using GhostScript from the command line, I've used the following in the past:
on Windows:
gswin32c -dNOPAUSE -q -g300x300 -sDEVICE=tiffg4 -dBATCH -sOutputFile=output_file_name.tif input_file_name.pdf
on *nix:
gs -dNOPAUSE -q -g300x300 -sDEVICE=tiffg4 -dBATCH -sOutputFile=output_file_name.tif input_file_name.pdf
For a large number of files, a simple batch/shell script could be used to convert an arbitrary number of files...
-sDEVICE=tiffg4
is a black and white fax compression model. See: pages.cs.wisc.edu/~ghost/doc/AFPL/8.00/Devices.htm#TIFF –
Haematoma -g
switch with -r
: gswin32c -dNOPAUSE -q -r300x300 ...
–
Stoller -sDEVICE=tiff24nc
for 24-bit RGB, or -sDEVICE=tiff12nc
for 12-bit (8/4 bits per channel, respectively). –
Talca I wrote a little powershell script to go through a directory structure and convert all pdf files to tiff files using ghostscript. Here is my script:
$tool = 'C:\Program Files\gs\gs8.63\bin\gswin32c.exe'
$pdfs = get-childitem . -recurse | where {$_.Extension -match "pdf"}
foreach($pdf in $pdfs)
{
$tiff = $pdf.FullName.split('.')[0] + '.tiff'
if(test-path $tiff)
{
"tiff file already exists " + $tiff
}
else
{
'Processing ' + $pdf.Name
$param = "-sOutputFile=$tiff"
& $tool -q -dNOPAUSE -sDEVICE=tiffg4 $param -r300 $pdf.FullName -c quit
}
}
1) Install GhostScript
2) Install ImageMagick
3) Create "Convert-to-TIFF.bat" (Windows XP, Vista, 7) and use the following line:
for %%f in (%*) DO "C:\Program Files\ImageMagick-6.6.4-Q16\convert.exe" -density 300 -compress lzw %%f %%f.tiff
Dragging any number of single-page PDF files onto this file will convert them to compressed TIFFs, at 300 DPI.
using python this is what I ended up with
import os
os.popen(' '.join([
self._ghostscriptPath + 'gswin32c.exe',
'-q',
'-dNOPAUSE',
'-dBATCH',
'-r300',
'-sDEVICE=tiff12nc',
'-sPAPERSIZE=a4',
'-sOutputFile=%s %s' % (tifDest, pdfSource),
]))
How about pdf2tiff? http://python.net/~gherman/pdf2tiff.html
ABCPDF can do so as well -- check out http://www.websupergoo.com/helppdf6net/default.html
The PDF Focus .Net can do it in such way:
1. PDF to TIFF
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
string pdfPath = @"c:\My.pdf";
string imageFolder = @"c:\images\";
f.OpenPdf(pdfPath);
if (f.PageCount > 0)
{
//Save all PDF pages to image folder as tiff images, 200 dpi
int result = f.ToImage(imageFolder, "page",System.Drawing.Imaging.ImageFormat.Tiff, 200);
}
2. PDF to Multipage-TIFF
//Convert PDF file to Multipage TIFF file
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
string pdfPath = @"c:\Document.pdf";
string tiffPath = @"c:\Result.tiff";
f.OpenPdf(pdfPath);
if (f.PageCount > 0)
{
f.ToMultipageTiff(tiffPath, 120) == 0)
{
System.Diagnostics.Process.Start(tiffPath);
}
}
https://pypi.org/project/pdf2tiff/
You could also use pdf2ps, ps2image and then convert from the resulting image to tiff with other utilities (I remember 'paul' [paul - Yet another image viewer (displays PNG, TIFF, GIF, JPG, etc.])
Disclaimer: work for product I am recommending
Atalasoft has a .NET library that can convert PDF to TIFF -- we are a partner of FOXIT, so the PDF rendering is very good.
Required ghostscript & tiffcp Tested in Ubuntu
import os
def pdf2tiff(source, destination):
idx = destination.rindex('.')
destination = destination[:idx]
args = [
'-q', '-dNOPAUSE', '-dBATCH',
'-sDEVICE=tiffg4',
'-r600', '-sPAPERSIZE=a4',
'-sOutputFile=' + destination + '__%03d.tiff'
]
gs_cmd = 'gs ' + ' '.join(args) +' '+ source
os.system(gs_cmd)
args = [destination + '__*.tiff', destination + '.tiff' ]
tiffcp_cmd = 'tiffcp ' + ' '.join(args)
os.system(tiffcp_cmd)
args = [destination + '__*.tiff']
rm_cmd = 'rm ' + ' '.join(args)
os.system(rm_cmd)
pdf2tiff('abc.pdf', 'abc.tiff')
I like PDFTIFF.com to convert PDF to TIFF, it can handle unlimited pages
Maybe also try this? PDF Focus
This .Net library allows you to solve the problem :)
This code will help (Convert 1000 PDF files to 300-dpi TIFF files in C#):
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
string[] pdfFiles = Directory.GetFiles(@"d:\Folder with 1000 pdfs\", "*.pdf");
string folderWithTiffs = @"d:\Folder with TIFFs\";
foreach (string pdffile in pdfFiles)
{
f.OpenPdf(pdffile);
if (f.PageCount > 0)
{
//save all pages to tiff files with 300 dpi
f.ToImage(folderWithTiffs, Path.GetFileNameWithoutExtension(pdffile), System.Drawing.Imaging.ImageFormat.Tiff, 300);
}
f.ClosePdf();
}
© 2022 - 2024 — McMap. All rights reserved.
gs -q -dNOPAUSE -r300x300 -sDEVICE=tiff24nc -sOutputFile=output.tif input.pdf -c quit
(on Windows the command isgswin32c
) to produce 300x300 dpi and 24bit color image – Tarimpdftoppm
, as follows:mkdir images && pdftoppm -tiff -r 300 mypdf.pdf images/pg
. See here for details, usage, & more info: askubuntu.com/questions/150100/…. – Outrun.tif
files, and it worked perfectly for my needs. It output settings can also be adjusted. – Anaximander