Split PDF by multiple pages using PDFTK?
Asked Answered
D

6

13

I am finding it hard to phrase this question and could not find an online solution for what I'm trying to do.

I know how to split a large PDF into single pages with PDFTK using the following script:

pdftk your_file.pdf burst output your_directory/page_%02d.pdf

But now I want to split the PDF by every other page, so that each new PDF has TWO (2) pages (e.g. pages 1 + 2 together, pages 3 + 4 together, 5 + 6, etc.).

I know that Acrobat does this like a champ, however I need something I can execute from Powershell.

I am open to alternatives/workarounds, like taking the single pages and combining them by two's after single bursting.

Delaney answered 5/5, 2017 at 13:4 Comment(0)
A
10

This PowerShell script will

  1. use pdftk to get the number of pages
  2. loop in steps building a range string
  3. use the range to extract the pages into a new pdf with appended range to the base name (and store in the same folder).

Change the first two vars to fit your environment.

## Q:\Test\2017\05\06\Split-Pdf.ps1
$pdfPath = 'Q:\Test\2017\05\06\'
$pdfFile = Join-Path $pdfPath "test.pdf"
$SetsOfPages = 3
$Match = 'NumberOfPages: (\d+)'
$NumberOfPages = [regex]::match((pdftk $pdfFile dump_data),$Match).Groups[1].Value
"{0,2} pages in {1}" -f $NumberOfPages, $pdfFile

for ($Page=1;$Page -le $NumberOfPages;$Page+=$SetsOfPages){
  $File = Get-Item $pdfFile
  $Range = "{0}-{1}" -f $page,[math]::min($Page+$SetsOfPages-1,$NumberOfPages)
  $OutFile = Join-Path $pdfPath ($File.BaseName+"_$Range.pdf")
  "processing: {0}" -f $OutFile
  pdftk $pdfFile cat $Range output $OutFile
}

Edited to work with variable sets of pages and to properly handle the overhang.
Edited again: found a much easier way do shorten the last set of pages.

Sample output

> .\Split-Pdf.ps1
10 pages in Q:\Test\2017\05\06\test.pdf
processing: Q:\Test\2017\05\06\test_1-3.pdf
processing: Q:\Test\2017\05\06\test_4-6.pdf
processing: Q:\Test\2017\05\06\test_7-9.pdf
processing: Q:\Test\2017\05\06\test_10-10.pdf
Annapolis answered 6/5, 2017 at 17:33 Comment(5)
Thank you! It split the document every 2 pages. Out of curiosity can this be modified to dynamically define the page numbers to split by?Delaney
Should be no problem, I will edit the answer in a minute. Edit the var $SetsOfPages to the desired size.Annapolis
Also, a minor issue, I am noticing that in the event that the original pdf is an odd number of pages, it looks like it discards the last page.Delaney
Edited the answer to properly handle the overhang / and with variable sets of pages.Annapolis
Thank you so much! This is perfect. You are awesome!Delaney
G
5

You can use sejda-console, it's open source under AGPLv3 and can be downloaded from the project GitHub page.

You can use the splitbyevery command which

Splits a given PDF document every 'n' pages creating documents of 'n' pages each.

In you case the command line will be something like:

sejda-console splitbyevery -n 2 -f /tmp/input_file.pdf -o /out_dir

Graeae answered 7/5, 2017 at 9:33 Comment(1)
Thank you for the different option. I will contact my IT department and look into that.Delaney
T
4

You can use the cat keyword to generate files from the desired pages.

pdftk in.pdf cat 1-2 output out1.pdf
pdftk in.pdf cat 3-4 output out2.pdf

A bash script can be added in order to be easier to use:

 #!/bin/bash 
 COUNTER=0
 while [  $COUNTER -lt $NUMBEROFPAGES ]; do
     pdftk in.pdf cat $COUNTER-$COUNTER+1 output out1.pdf
     let COUNTER=COUNTER+2 
 done
Thebaid answered 5/5, 2017 at 14:22 Comment(0)
B
3

I found Szakacs Peter's solution to be wonderful, but the bash script needed three tweaks: starting $COUNTER at 1 so that it refers to the first page of the pdf; adding double braces on line four so that (($COUNTER+1)) evaluates; another $COUNTER to make the output file names unique.

The final bash script that solved this for me was:

#!/bin/bash 
 COUNTER=1
 while [  $COUNTER -lt $NUMBEROFPAGES ]; do
     pdftk in.pdf cat $COUNTER-$(($COUNTER+1)) output out$COUNTER.pdf
     let COUNTER=COUNTER+2 
 done

Then just save this as something like burst2page.sh, do a chmod u+x burst2page.sh to make it executable, then run it with ./burst2page.sh

Blimp answered 7/3, 2020 at 5:9 Comment(0)
M
1

Brad Smith's script is good however it won't work in that shape. When you don't define $NUMBEROFPAGES, the script throws you an error script.sh: line 3: [: 1: unary operator expected. I suggest to change it to:

#!/bin/bash 
FILE='in.pdf'
COUNTER=1
NUMBEROFPAGES=`pdftk $FILE dump_data |grep NumberOfPages | awk '{print $2}'`
NUMBEROFPAGES="${NUMBEROFPAGES//[$'\t\r\n ']}" #to strip possible white characters
while [  $COUNTER -lt $NUMBEROFPAGES ]; do
    pdftk $FILE cat $COUNTER-$(($COUNTER+1)) output out$COUNTER.pdf
    let COUNTER=COUNTER+2 
done
Murrah answered 20/3, 2020 at 14:23 Comment(0)
S
1

Split by an arbitrary number of pages (as second argument): e.g. <script_filename>.sh <filename.pdf> <num_pages_per_output_file>

#!/bin/bash 
FILE="${1}"
SPAN=${2:-2}
SPAN_LESS_1=$((SPAN - 1))
COUNTER=1
NUMBEROFPAGES=`pdftk $FILE dump_data |grep NumberOfPages | awk '{print $2}'`
while [  $COUNTER -lt $NUMBEROFPAGES ]; do
    CANDIDATE_END=$(($COUNTER+$SPAN_LESS_1))
    END=$(($CANDIDATE_END<$NUMBEROFPAGES ? $CANDIDATE_END : $NUMBEROFPAGES))
    OUT_NAME="${FILE%.*}__${COUNTER}-${END}.pdf"
    pdftk $FILE cat $COUNTER-$END output ${OUT_NAME}
    let COUNTER=COUNTER+SPAN
done
 

Also, output filenames will have both start and end page numbers appended to the input filename, e.g.

<input_filename>__1-15.pdf
<input_filename>__16-30.pdf
...
Scarf answered 2/4, 2022 at 19:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.