Combining PDF with GhostScript: Using Original Bookmarks with corrected page numbers
Asked Answered
Z

3

8

I am using

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf  -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf

to create a single PDF document from a series of pdf documents. I was going to include a new made-up table of content and include it using the pdfmark mechanism. Then I notice that the original files already have bookmarks in them - they are however referenced to the original page numbers, not the ones in the combined document.

I am looking for two possible solutions. Remove the orginal bookmarks or make use of the original bookmarks but somehow update their page references...

Zany answered 9/11, 2011 at 19:50 Comment(0)
Z
5

As so often the case, someone has walked the same path before you...

unfolding disasters has worked out a solution based on https://ubuntuforums.org/showthread.php?t=1545064 to this very problem. His python script pdf-merge.py first invokes pdftk with its dump_data switch to retrieve all the pdfmark information. It then keeps track of the total number of pages for each merged document and does the math to offset the new page number pointer in the pdfmark instruction by the sum total of page counts of all the PDF documents included before the current PDF document. So it is close but not the same as the 2-pass approach of KenS. It first discovers bookmarks using pdftk and then creates a new bookmark file with correct page numbers. It also manages to turn the original pdfmark instruction (that would normally be preserved by gs into noop). I won't pretend I understand how that last part worked ...

However, the script does all I need including the option of tweaking the bookmark file before the final writing. Very neat and hat tip to Trevor King.

[Edit by K J] I have updated dead links above to web archived sources but the code was expanded later on to be used in "r-XMPDF" so for those interested see that method here https://github.com/trevorld/r-xmpdf?tab=readme-ov-file#add-xmpdocinfo-metadata-and-bookmarks-to-a-pdf

{xmpdf} provides functions for getting and setting Extensibe Metadata Platform (XMP) metadata in a variety of media file formats as well as getting and setting PDF documentation info entries and bookmarks (aka outline aka table of contents).

Zany answered 12/11, 2011 at 22:0 Comment(5)
By the way, I commented out line 380-382 of the file, as it tripped on version='%(prog)s {}\.format(__version__)), but in hindsight this seems not smart.Forbidden
Error message: ValueError: zero length field name in formatForbidden
@Bernhard. This message (OSError: [Errno 13] Permission denied) sounds to me like you don't have write permissions to the directory where you're creating the file or read permissions for the file you are reading. Can you check that?Zany
That is the thing that is confusing me most: I am the owner of these files, thus having read/write access.Forbidden
The program falls over where it issues a OS call to pdftk yourfile.pdf dump_data. Do that on the commandline and decide where to go from there.Zany
G
4

In general pdfwrite doesn't know you are appending files, so it preserves bookmark and other 'metadata' information on the assumption that you will want it in the output.

However, when you are combining PDF files, preserving the information won't work, as the page numbers for the second and subsequent files will be incorrect.

So you need a 2-pass approach, first merge all the files, discarding the bookmarks, then 'convert' the merged file and add pdfmarks to set the correct bookmarks.

There is currently no option (with pdfwrite) to not preserve bookmarks. You will need to modify the Ghostscript PDF interpreter PostScript files to achieve this I think. You might try setting -dDOPDFMARKS=false, but I doubt that will work.

Gutty answered 10/11, 2011 at 12:18 Comment(1)
I tried your -dDOPDFMARKS=false but as you suspected, it didn't do a thing.Zany
C
0

To remove pdfmarks from a file the best way is convert pdf to ps and convert ps result to pdf again.

to remove pdfmark

gs -q -dNOPAUSE -dBATCH -sDEVICE=pswrite -sOutputFile=result.ps pdffilewithpdfmark.pdf

after that you can convert again

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=pdffilewithoutpdfmark.pdf result.ps

This two steps remove completely pdfmarks from file.

Extract pdfmarks is another question. I analyse pdf results after join pdfmarks into pdf file with ghostscript. The problem is the diference between pdfmarks we write to put inside pdfmark.ps file and how this command is converted inside pdf. In example.

You put inside a pdfmark.ps file the following line:

[ Title(chapter 01) /Page 1 /OUT pdfmark

This line adds a bookmark with title chapter 01 and it points to page 1. We join pdf file with this pdfmark.ps file with command

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=pdffilewithpdfmark.pdf pdffilewithoupdfmark.pdf pdfmark.ps

Into pdffilewithpdfmark.pdf file this simple line in pdfmark.ps file become:

697 0 obj
<< /Title(chapter 01)
/Dest [1 0 R /XYZ null null null]
>>
endobj

Into this case you can open pdf with notepad or another text editor and try to extract this part of file and edit to a new pdfmark.ps file. Into this case you need 2 information to build your pdfmark (title and page) But only Title is easy to identify.

/Title(chapter 01) "Title of bookmark"
/Dest [1           "Objetct pointed by bookmark not page!!"

You can get title of bookmark with this simple cmd command:

findstr /s /i /o /c:"Title" pdfwithpdfmarks.pdf 1>>bookmarks_title.ps

This command print all line where "Title" was found and record it inside bookmarks_title.ps.

Try this command without record into file to see the output.

findstr /s /i /o /c:"Title" pdfwithpdfmarks.pdf

Many information will be place with string and you need to filter what you want to mount statment pdfmark into pdfmark.ps. You will need to place page number manually. After that join pdffile without pdfmark (1st tip above) with this new bookmarks_title.ps edited and prepared to join into new pdf file.

Good Luck!

Cupboard answered 13/9, 2024 at 12:15 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.