Split PDF by multiple pages using PDFTK?

Asked 5/5, 2017 at 13:4 Answered 2/4, 2022 at 19:43

I am finding it hard to phrase this question and could not find an online solution for what I'm trying to do.

I know how to split a large PDF into single pages with PDFTK using the following script:

pdftk your_file.pdf burst output your_directory/page_%02d.pdf

But now I want to split the PDF by every other page, so that each new PDF has TWO (2) pages (e.g. pages 1 + 2 together, pages 3 + 4 together, 5 + 6, etc.).

I know that Acrobat does this like a champ, however I need something I can execute from Powershell.

I am open to alternatives/workarounds, like taking the single pages and combining them by two's after single bursting.

Delaney answered 5/5, 2017 at 13:4 Comment(0)

This PowerShell script will

use pdftk to get the number of pages
loop in steps building a range string
use the range to extract the pages into a new pdf with appended range to the base name (and store in the same folder).

Change the first two vars to fit your environment.

## Q:\Test\2017\05\06\Split-Pdf.ps1
$pdfPath = 'Q:\Test\2017\05\06\'
$pdfFile = Join-Path $pdfPath "test.pdf"
$SetsOfPages = 3
$Match = 'NumberOfPages: (\d+)'
$NumberOfPages = [regex]::match((pdftk $pdfFile dump_data),$Match).Groups[1].Value
"{0,2} pages in {1}" -f $NumberOfPages, $pdfFile

for ($Page=1;$Page -le $NumberOfPages;$Page+=$SetsOfPages){
  $File = Get-Item $pdfFile
  $Range = "{0}-{1}" -f $page,[math]::min($Page+$SetsOfPages-1,$NumberOfPages)
  $OutFile = Join-Path $pdfPath ($File.BaseName+"_$Range.pdf")
  "processing: {0}" -f $OutFile
  pdftk $pdfFile cat $Range output $OutFile
}

Edited to work with variable sets of pages and to properly handle the overhang.
Edited again: found a much easier way do shorten the last set of pages.

Sample output

> .\Split-Pdf.ps1
10 pages in Q:\Test\2017\05\06\test.pdf
processing: Q:\Test\2017\05\06\test_1-3.pdf
processing: Q:\Test\2017\05\06\test_4-6.pdf
processing: Q:\Test\2017\05\06\test_7-9.pdf
processing: Q:\Test\2017\05\06\test_10-10.pdf

Annapolis answered 6/5, 2017 at 17:33 Comment(5)

Thank you! It split the document every 2 pages. Out of curiosity can this be modified to dynamically define the page numbers to split by? – Delaney 8/5, 2017 at 14:19

Should be no problem, I will edit the answer in a minute. Edit the var $SetsOfPages to the desired size. – Annapolis 8/5, 2017 at 14:25

Also, a minor issue, I am noticing that in the event that the original pdf is an odd number of pages, it looks like it discards the last page. – Delaney 8/5, 2017 at 14:27

Edited the answer to properly handle the overhang / and with variable sets of pages. – Annapolis 8/5, 2017 at 15:31

Thank you so much! This is perfect. You are awesome! – Delaney 9/5, 2017 at 12:12

You can use sejda-console, it's open source under AGPLv3 and can be downloaded from the project GitHub page.

You can use the splitbyevery command which

Splits a given PDF document every 'n' pages creating documents of 'n' pages each.

In you case the command line will be something like:

sejda-console splitbyevery -n 2 -f /tmp/input_file.pdf -o /out_dir

Graeae answered 7/5, 2017 at 9:33 Comment(1)

Thank you for the different option. I will contact my IT department and look into that. – Delaney 8/5, 2017 at 14:22

You can use the cat keyword to generate files from the desired pages.

pdftk in.pdf cat 1-2 output out1.pdf
pdftk in.pdf cat 3-4 output out2.pdf

A bash script can be added in order to be easier to use:

 #!/bin/bash 
 COUNTER=0
 while [  $COUNTER -lt $NUMBEROFPAGES ]; do
     pdftk in.pdf cat $COUNTER-$COUNTER+1 output out1.pdf
     let COUNTER=COUNTER+2 
 done

Thebaid answered 5/5, 2017 at 14:22 Comment(0)

I found Szakacs Peter's solution to be wonderful, but the bash script needed three tweaks: starting $COUNTER at 1 so that it refers to the first page of the pdf; adding double braces on line four so that (($COUNTER+1)) evaluates; another $COUNTER to make the output file names unique.

The final bash script that solved this for me was:

#!/bin/bash 
 COUNTER=1
 while [  $COUNTER -lt $NUMBEROFPAGES ]; do
     pdftk in.pdf cat $COUNTER-$(($COUNTER+1)) output out$COUNTER.pdf
     let COUNTER=COUNTER+2 
 done

Then just save this as something like burst2page.sh, do a chmod u+x burst2page.sh to make it executable, then run it with ./burst2page.sh

Blimp answered 7/3, 2020 at 5:9 Comment(0)

Brad Smith's script is good however it won't work in that shape. When you don't define $NUMBEROFPAGES, the script throws you an error script.sh: line 3: [: 1: unary operator expected. I suggest to change it to:

#!/bin/bash 
FILE='in.pdf'
COUNTER=1
NUMBEROFPAGES=`pdftk $FILE dump_data |grep NumberOfPages | awk '{print $2}'`
NUMBEROFPAGES="${NUMBEROFPAGES//[$'\t\r\n ']}" #to strip possible white characters
while [  $COUNTER -lt $NUMBEROFPAGES ]; do
    pdftk $FILE cat $COUNTER-$(($COUNTER+1)) output out$COUNTER.pdf
    let COUNTER=COUNTER+2 
done

Murrah answered 20/3, 2020 at 14:23 Comment(0)

Split by an arbitrary number of pages (as second argument): e.g. <script_filename>.sh <filename.pdf> <num_pages_per_output_file>

#!/bin/bash 
FILE="${1}"
SPAN=${2:-2}
SPAN_LESS_1=$((SPAN - 1))
COUNTER=1
NUMBEROFPAGES=`pdftk $FILE dump_data |grep NumberOfPages | awk '{print $2}'`
while [  $COUNTER -lt $NUMBEROFPAGES ]; do
    CANDIDATE_END=$(($COUNTER+$SPAN_LESS_1))
    END=$(($CANDIDATE_END<$NUMBEROFPAGES ? $CANDIDATE_END : $NUMBEROFPAGES))
    OUT_NAME="${FILE%.*}__${COUNTER}-${END}.pdf"
    pdftk $FILE cat $COUNTER-$END output ${OUT_NAME}
    let COUNTER=COUNTER+SPAN
done

Also, output filenames will have both start and end page numbers appended to the input filename, e.g.

<input_filename>__1-15.pdf
<input_filename>__16-30.pdf
...

Scarf answered 2/4, 2022 at 19:43 Comment(0)

Recommended topics

Hot tags