Using the redaction facility of PyMuPDF is probably the adequate thing to do.
The approach:
- Identify the location of the text to replace
- Erase the text and replace it using redactions
Care must be taken to get hold of the original font, and whether or not the new text is longer / short than the original.
import fitz # import PyMuPDF
doc = fitz.open("myfile.pdf")
page = doc[number] # page number 0-based
# suppose you want to replace all occurrences of some text
disliked = "delete this"
better = "better text"
hits = page.search_for("delete this") # list of rectangles where to replace
for rect in hit:
page.add_redact_annot(rect, better, fontname="helv", fontsize=11,
align=fitz.TEXT_ALIGN_CENTER, ...) # more parameters
page.apply_annots(images=fitz.PDF_REDACT_IMAGE_NONE) # don't touch images
doc.save("replaced.pdf", garbage=3, deflate=True)
This works well with short text and medium quality expectations.
With some more effort, the original font properties, color, font size, etc. can be identified to produce a close-to-perfect result.
apply_annots
should be replaced byapply_redactions
. – Cruiserweight