How can I replace Image in PDF Programmatically (using command line ideally) [closed]
Asked Answered
D

1

6

I'm looking for a method to replace an image in an existing PDF file (setup as a template).

In a perfect world I could just run a command (linux or windows will work, I'm not picky) at the command line, however, if need be, I could also implement something using a scripting language or even a full blown program at this point (I have visual studio).

I honestly can't believe how hard it has been to find an example of this being done somewhere.

We're running Windows w/ Cygwin and have Acrobat Pro XI as well as Illustrator CS6 from which the PDF file is originally created.

Essentially what we're looking for is an efficient way to replace images for PDF files that we're sending to the printing house.

Douglasdouglashome answered 22/1, 2016 at 2:12 Comment(8)
Are all of the images the same size? And do they have to be put in the same location? Does your template PDF also need to contain an image or could it just have empty space? Are the size / location of the image always the same?Suzettesuzi
We have a number of pieces that we produce. The simplest is just a postcard that will have a single image to replace. The most complicated is a brochure that was about 15. Right now the process is to open a piece in illustrator and then change the link of each placed image and save as PDF. the size and location of a given image is always the same. The finished piece has to have a image but the template can or cannot have one.Douglasdouglashome
Wouldn't it be simpler than to create templates without images in them and simply insert the images once you have them? The problem with replacing images is that you have find them and remove them from the PDF stream. If all you need to do is insert new images, that task may be much easier (certainly so if those images are always at the very top or the very bottom of the Z-order of objects on the page. You could create a PDF file with the images and simply superimpose it in front or back of your template PDF and you'd be done, no?Suzettesuzi
Ok - hadn't considered that would be easier. Do you have any suggestions for the easiest way to add those images?Douglasdouglashome
I'm pretty sure you could accomplish that by scripting either Acrobat or Illustrator, but there are also lots of command-line utilities that feature similar functionality; an example here would be the "stamp" functionality from PDFtk (pdflabs.com/docs/pdftk-man-page/#dest-op-stamp)Suzettesuzi
Thanks David - that looks like exactly the type of thing I'm looking for. I'm going to play around a bit with this and see if I can make it work :)Douglasdouglashome
Glad I could point you in the right direction (hopefully :)) - if it works, let me know and I'll write this comment stream up as a real answer you can approve.Suzettesuzi
duplicate of Replace an image in a PDF using command lineDaisydaitzman
G
0

I used pymupdf successfully to carry out something similar. Details (and other approaches) in the thread of https://github.com/pymupdf/PyMuPDF/discussions/924#discussioncomment-7249686.

TLDR (copied my comment from over there):

Load doc and page (taken over from JorjMcKie's comment):

doc = fitz.open("input.pdf")
page = doc[pno] # read the page at page number pno
img_list = page.get_images(full=True) # a list of all images on that page

then:

p = fitz.Pixmap(doc, 6) # or whatever xref id
q = fitz.Pixmap(fitz.Colorspace(fitz.CS_RGB), p)  # can save jpg only in RGB format, this was DeviceCMYK
q.save("6-rgb.jpg")

Now make whatever modding with Gimp, then load the modded back

r = fitz.Pixmap("6-rgb-mod.jpg")
s = fitz.Pixmap(fitz.Colorspace(fitz.CS_CMYK), r)

Aand now allegedly it would be as simple as

page.replace_image(6, pixmap=s)

but maybe I have an older pymupdf which was throwing an exception on missing doc.is_image (in newer source it is doc.xref_is_image, so probably fixed), so I followed the implementation of replace_image:

new_xref = page.insert_image(page.rect, pixmap=s)
doc.xref_copy(new_xref, 6)
last_contents_xref = page.get_contents()[-1]
doc.update_stream(last_contents_xref, b" ")

And finally save

doc.save("output.pdf", garbage=3, deflate=True)

Inspecting with mutool, the old image is still in place, but not used. So if you want to save space, probably this is not the good / full way. But if you want to replace an image quickly, leaving other visuals as-is (say for printing), then can be fine.

Doc reference: https://pymupdf.readthedocs.io/en/latest/

Grady answered 22/1, 2016 at 2:12 Comment(1)
Note that if you don't have funky things going on with the image (like clipping mask, or overlapping elements) then the other way included in the Github comments on doing a bounding box redaction would be cleaner, saving space too, as the old image would be removed from the stream... though it might be possible to remove the old image with the above approach too, but I was not pressed to find out.Grady

© 2022 - 2024 — McMap. All rights reserved.