How do you combine PDFs in ruby?
Asked Answered
P

8

16

This was asked in 2008. Hopefully there's a better answer now.

How can you combine PDFs in ruby?

I'm using the pdf-stamper gem to fill out a form in a PDF. I'd like to take n PDFs, fill out a form in each of them, and save the result as an n-page document.

Can you do this with a native library like prawn? Can you do this with rjb and iText? pdf-stamper is a wrapper on iText.

I'd like to avoid using two libraries (i.e. pdftk and iText), if possible.

Pollux answered 17/8, 2010 at 5:28 Comment(1)
Does this answer your question? Is it possible to combine a series of PDFs into one using Ruby?Ironmonger
P
5

I wrote a ruby gem to do this — PDF::Merger. It uses iText. Here's how you use it:

pdf = PDF::Merger.new
pdf.add_file "foo.pdf"
pdf.add_file "bar.pdf"
pdf.save_as "combined.pdf"
Pollux answered 20/10, 2010 at 17:31 Comment(6)
I'm curious as to the iText License. If you have a Rails Application, do you have to buy a License, or can you use it for free without open sourcing the entire application?Quadrillion
iText <= 4.2 is MPL/LGPL. iText >= 5.0 is Affero GPL. pdf-merger uses 4.2.Pollux
Can I grab a remote pdf from an amazon bucket and merge it with your gem?Illuminometer
The gem only works on files in the local filesystem. If you have the S3 bucket mounted (say, with S3FS), then sure. Otherwise, no, you'd need to download it first.Pollux
Check the solution by Evan Closson if you want to avoid installing a JVM just for this gem.Beckerman
Can we also check if there is empty pages while merging? Like first pdf say has lot of empty spaces say to the last page, and thus start adding content from there while merging. Is this possible?Jayson
O
20

As of 2013 you can use Prawn to merge pdfs. Gist: https://gist.github.com/4512859

class PdfMerger

  def merge(pdf_paths, destination)

    first_pdf_path = pdf_paths.delete_at(0)

    Prawn::Document.generate(destination, :template => first_pdf_path) do |pdf|

      pdf_paths.each do |pdf_path|
        pdf.go_to_page(pdf.page_count)

        template_page_count = count_pdf_pages(pdf_path)
        (1..template_page_count).each do |template_page_number|
          pdf.start_new_page(:template => pdf_path, :template_page => template_page_number)
        end
      end

    end

  end

  private

  def count_pdf_pages(pdf_file_path)
    pdf = Prawn::Document.new(:template => pdf_file_path)
    pdf.page_count
  end

end
Orren answered 11/1, 2013 at 18:59 Comment(6)
Thanks. Huge timesaver. Could replace the previous pdf-merger gem which made use of Java. yuck. This should be the accepted answer.Beckerman
I have merged thousands of PDFs into one with this script. Thanks!Alainealair
Note that Prawn templates don't work with all PDFs- It's a known issue and they've considered dropping support for it altogether. So far though it's still the best Ruby solution.Legalism
Just a note for everyone finding this answer - they have officially dropped templates now. You'll have to go back to version 0.14.0 to get them back.Russ
It will not work as Prawn dropped template support. See more about that here: github.com/prawnpdf/prawn/issues/376Stadtholder
Doesn't this approach use tons of memory?Federica
S
18

After a long search for a pure Ruby solution, I ended up writing code from scratch to parse and combine/merge PDF files.

(I feel it is such a mess with the current tools - I wanted something native but they all seem to have different issues and dependencies... even Prawn dropped the template support they use to have)

I posted the gem online and you can find it at GitHub as well.

you can install it with:

gem install combine_pdf

It's very easy to use (with or without saving the PDF data to a file).

For example, here is a "one-liner":

(CombinePDF.load("file1.pdf") << CombinePDF.load("file2.pdf") << CombinePDF.load("file3.pdf")).save("out.pdf")

If you find any issues, please let me know and I will work on a fix.

Schoolgirl answered 10/9, 2014 at 2:50 Comment(6)
Can I use combine_pdf to merge multiple different sized pdfs into one with multiple pages, so for example merge 8 pdfs to a new pdf with 2 pages?Ithyphallic
I tried it with different page sizes and it merges the PDF files without an issue. the original page sizes remain persistent. I'm not sure what you mean by merging 8 files and getting 2 pages - I assume you meant 2 page sizes...?Schoolgirl
I mean merging 2 A5 sized PDF's into 1 A4 sized PDF for example.Ithyphallic
Hi Tim, CombinePDF doesn't support that level of editing. it's only meant to answer the need for simple operations. If you have an idea how to go about implementing such a feature using CombinePDF's codebase, feel free to open a pull request/issue on github and we'll work something out.Schoolgirl
I see, I need it for a project that's coming up, but I guess working with images and prawn for example would be easier. But I depend on a third party for the content so if PDF's are the only possibility than that is definitely an option. Thanks for your reply.Ithyphallic
Before moving ahead with this, check the list of known limitations on the README. The loss of form data was a deal-breaker for me.Diane
B
11

Use ghostscript to combine PDFs:

 options = "-q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite"
 system "gs #{options} -sOutputFile=result.pdf file1.pdf file2.pdf"
Baltoslavic answered 19/1, 2012 at 15:22 Comment(0)
P
5

I wrote a ruby gem to do this — PDF::Merger. It uses iText. Here's how you use it:

pdf = PDF::Merger.new
pdf.add_file "foo.pdf"
pdf.add_file "bar.pdf"
pdf.save_as "combined.pdf"
Pollux answered 20/10, 2010 at 17:31 Comment(6)
I'm curious as to the iText License. If you have a Rails Application, do you have to buy a License, or can you use it for free without open sourcing the entire application?Quadrillion
iText <= 4.2 is MPL/LGPL. iText >= 5.0 is Affero GPL. pdf-merger uses 4.2.Pollux
Can I grab a remote pdf from an amazon bucket and merge it with your gem?Illuminometer
The gem only works on files in the local filesystem. If you have the S3 bucket mounted (say, with S3FS), then sure. Otherwise, no, you'd need to download it first.Pollux
Check the solution by Evan Closson if you want to avoid installing a JVM just for this gem.Beckerman
Can we also check if there is empty pages while merging? Like first pdf say has lot of empty spaces say to the last page, and thus start adding content from there while merging. Is this possible?Jayson
L
2

Haven't seen great options in Ruby- I got best results shelling out to pdftk:

system "pdftk #{file_1} multistamp #{file_2} output #{file_combined}"
Legalism answered 11/1, 2014 at 17:2 Comment(0)
C
0

If you want to add any template (created by macOS Pages or Google Docs) using the combine_pdf gem then you can try with this:

final_pdf = CombinePDF.new
company_template = CombinePDF.load(template_file.pdf).pages[0]
pdf = CombinePDF.load (content_file.pdf)
pdf.pages.each {|page| final_pdf << (company_template << page)} 
final_pdf.save "final_document.pdf"
Caritta answered 17/8, 2010 at 5:28 Comment(0)
V
0

We're closer than we were in 2008, but not quite there yet.

The latest dev version of Prawn lets you use an existing PDF as a template, but not use a template over and over as you add more pages.

Viafore answered 17/8, 2010 at 8:28 Comment(0)
H
0

Via iText, this will work... though you should flatten the forms before you merge them to avoid field name conflicts. That or rename the fields one page at a time.

Within PDF, fields with the same name share a value. This is usually not the desired behavior, though it comes in handy from time to time.

Something along the lines of (in java):

PdfCopy mergedPDF = new PdfCopy( new Document(), new FileOutputStream( outPath );

for (String path : paths ) {
  PdfReader reader = new PdfReader( path );
  ByteArrayOutputStream curFormOut = new ByteArrayOutputStream();
  PdfStamper stamper = new PdfStamper( reader, curFormOut );

  stamper.setField( name, value ); // ad nauseum

  stamper.setFlattening(true); // flattening setting only takes effect during close()
  stamper.close();

  byte curFormBytes = curFormOut.toByteArray();
  PdfReader combineMe = new PdfReader( curFormBytes );

  int pages = combineMe .getNumberOfPages();
  for (int i = 1; i <= pages; ++i) { // "1" is the first page
    mergedForms.addPage( mergedForms.getImportedPage( combineMe, i );
  }
}

mergedForms.close();
Haroldharolda answered 18/10, 2010 at 22:57 Comment(2)
There's a much simpler way to do this — you can usePdfCopyFields and addDocument. See the gem I made.Pollux
Granted, but PdfCopyFields won't rename fields... and given the "same name == same value" thing, I thought flattening to be the best route. I'd think field renaming would be right up CopyField's alley, but I don't see anything in the API ref: api.itextpdf.com. PdfStamper can change field names, but won't handle the importing for you. Sadly iText has this sort of "can't walk and chew gum" type problem fairly often, requiring that you create, 'save', and read the same PDF to apply it to some other thing. Not terribly efficient, but it works, and its hard to be the price.Haroldharolda

© 2022 - 2024 — McMap. All rights reserved.