PyMUPDF - How to convert PDF to image, using the original document settings for the image size and set to 300dpi?
Asked Answered
T

1

11

I'm currently looking at using the python package PyMuPDF for a workflow that converts PDF's to images (In my case, .TIFF files).

I am trying to mimic the behaviour of another program that I currently use for PDF -> Image conversion. In that program, it lets you set the settings for imaging as below:

Image Output Quality (DPI): (Defaults to 300dpi)

Basic Image Size: Original setting - renders the image with the original document settings.

My question is, is this possible within PyMuPDF? How can I set the output DPI for my images to 300 and set the image size to the original document size? I am quite new to dealing with this sort of processing for PDF's/images so any help would be much appreciated.

Thanks in advance,

Talbott answered 2/10, 2021 at 8:13 Comment(0)
H
11

PyMuPDF is wrapped around MuPDF

It has many powerful pdf manipulation options which include the ability to set page scale and resolution of page image outputs.

However MuPDF does support Tiff input but not natively export to single or multipage Tiff, thus would need an additional conversion from say multiple PNG which is native.

The range of current inputs and outputs

Input   Output  Description
JPEG    -       Joint Photographic Experts Group
BMP     -       Windows Bitmap
JXR     -       JPEG Extended Range
JPX     -       JPEG 2000
GIF     -       Graphics Interchange Format
TIFF    -       Tagged Image File Format
PNG     PNG     Portable Network Graphics
PNM     PNM     Portable Anymap
PGM     PGM     Portable Graymap
PBM     PBM     Portable Bitmap
PPM     PPM     Portable Pixmap
PAM     PAM     Portable Arbitrary Map
-       PSD     Adobe Photoshop Document
-       PS      Adobe Postscript

to export to tiff you would need say PIL/Pillow along the lines of

from PIL import Image
import fitz

pix = fitz.Pixmap(...)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
img.save("output.tif", "TIFF")

However for storing single pages to muti-page you will need to experiment with PILlow settings.

[Update]

I see you asked this question in PyMuPDF and for others benefit the answer was

Sounds like you will create a so-called "pixmap" for each page and save that as an image. PyMuPDF itself only support a handful of image output formats, the most popular being PNG, others are the PNM-type images. If you want to use others, you must use an additional package, presumably PIL/Pillow. PyMuPDF supports Pillow directly via its pixmap output methods. So a code snippet may look like this:

import fitz
mat = fitz.Matrix(300 / 72, 300 / 72)  # sets zoom factor for 300 dpi
doc = fitz.open("yourfile.pdf")
for page in doc:
    pix = page.get_pixmap(matrix=mat)
    img_filename = "page-%04i.tiff" % page.number
    pix.pil_save(img_filename, format="TIFF", dpi=(300,300), ... more PIL parameters)

For more sophistication on PIL output, please consult their documentation. For example, TIFF supports multiple images in one file.

Hassiehassin answered 2/10, 2021 at 12:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.