Split a PDF file into another two PDF files using qpdf
Asked Answered
A

2

13

Is it possible to split a PDF file into two parts or n parts using qpdf tool?

The docs say so but I couldn't find the exact command to do it.

I'm using qpdf version 10.0.1.

Arola answered 20/6, 2020 at 13:10 Comment(1)
I think you need to invoke qpdf for each output file separately. For each output file, you can provide the page range you'd like to extract to the output file, such as qpdf infile.pdf --pages infile.pdf 2-3 -- outfile.pdf to extract pages 2 and 3 from infile.pdf to outfile.pdf. So using some shell loop you can create a command line that will extract multiple files.Phonetist
T
17

Yes, it is very easy:

Assume infile.pdf has 12 pages (pagecount):

qpdf.exe --split-pages=n infile.pdf output-%d.pdf

Pseudo code:

n = Integer(pagecount / number_of_parts)

More details from the original documentation:

--split-pages=[n]

Write each group of n pages to a separate output file. If n is not specified, create single pages. Output file names are generated as follows:

If the string %d appears in the output file name, it is replaced with a range of zero-padded page numbers starting from 1.

Otherwise, if the output file name ends in .pdf (case insensitive), a zero-padded page range, preceded by a dash, is inserted before the file extension.

Otherwise, the file name is appended with a zero-padded page range preceded by a dash.

Page ranges are a single number in the case of single-page groups or two numbers separated by a dash otherwise. For example, if infile.pdf has 12 pages

qpdf --split-pages infile.pdf %d-out would generate files 01-out through 12-out

qpdf --split-pages=2 infile.pdf outfile.pdf would generate files outfile-01-02.pdf through outfile-11-12.pdf

qpdf --split-pages infile.pdf something.else would generate files something.else-01 through something.else-12

Reference: https://qpdf.readthedocs.io/en/stable/cli.html#option-split-pages

Turmoil answered 19/11, 2020 at 16:17 Comment(4)
How can I put the page number without zero-padding in the filename? E.g. I want 1.pdf instead of 0001.pdf.Residuum
You can't. Quote from qpdf.readthedocs.io/en/stable/cli.html Zero padding is added to all page numbers in file names so that all the numbers are the same length, which causes the output filenames to sort lexically in numerical order. You need to run another script or external code to rename the filesTurmoil
Thanks. Not ideal that qpdf decides this. For anyone else wondering, using something like rename -n 's/^(PREFIX)0+(\d+)/$1$2/' *.pdf removes the zero padding from all PDF filenames of the form PREFIX_001.pdf in the CWD. Dry run with -n.Residuum
I find it best to use the pseudo-code, n = Ceiling(pagecount / number_of_parts). Here's an example why. I have a PDF with 519 pages, and I'd like to split it in 2. From Python, int(519/2) gives 259. The command, qpdf --split-pages=259 input.pdf int_out_%d.pdf gives me 3 files - int_out_001-259.pdf, int_out_260-518.pdf, int_out_519-519.pdf. That's 1 more than I would like. However, getting the number from import math; math.ceil(519/2) gives 260. qpdf --split-pages=260 input.pdf ceil_out_%d.pdf gives me 2 files, as desired - ceil_out_001-260.pdf and ceil_out_261-519.pdf.Goof
S
2

A somewhat brutish way of handling this that can be updated to subset multiple files by only modifying the bigfile and output = then re-running it all. Will update once I make a proper function for it.

pacman::p_load(pdftools, qpdf)
#some prep
bigfile <- "Some/File/Path.pdf"
biginfo <- pdf_length(bigfile)

# now we subset x2 being sure to define unique names for output
# otherwise the second file will overwrite the first one we create here.
# for part 1
pdf_subset(bigfile,
           pages = 1:(biginfo/2),
           output = "Some/File/Path_part_1.pdf")
# for part 2
pdf_subset(bigfile,
           pages = ((biginfo+1)/2):biginfo,
           output = "Some/File/Path_part_2.pdf")
Suburban answered 28/8, 2020 at 20:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.