I have a pdf file with multiple pages, but I am interested in only a subgroup of them. For example, my original PDF has 30 pages and I want only the pages 10 to 16.
I tried using the function split_pdf from tabulizer package, that only splits the pdf page to page (resulting in 200 files, one for each page), followed by merge_pdfs(which merge pdf files). It worked properly, but is taking ages (and I have around 2000 pdf files I have to split).
This is the code I am using:
split = split_pdf('file_path')
start = 10
end = 16
merge_pdfs(split[start:end], 'saving_path')
I couldn't find any better option to do this. Any help would appreciated.
pdftools
package if you haven't already. Haven't used it myself, but it is a common recommendation. Second, if this is not eating up too much memory, you might try running your split/merge combo through a parallel process. See packagesparallel
orforeach
. You may be able to run through a number of these at the same time. – Reisfield