Statically compile pdftk for Heroku. Need to split PDF into single page files

Asked 20/8, 2011 at 1:2 Answered 26/5, 2013 at 8:25

So we're using heroku to host our rails application. We've moved to the cedar stack. This stack does not have the pdftk library installed. I contacted support and was told to statically compile it for amd64 ubuntu and include it in my application.

This has proved more difficult than I thought. Initially I downloaded the package for ubuntu (http://packages.ubuntu.com/natty/pdftk), extracted it, and included the binary file as well as the shared libraries. I'm getting strange errors like:

Unhandled Java Exception:
java.lang.NullPointerException
   at com.lowagie.text.pdf.PdfCopy.copyIndirect(pdftk)
   at com.lowagie.text.pdf.PdfCopy.copyObject(pdftk)
   at com.lowagie.text.pdf.PdfCopy.copyDictionary(pdftk)

I'm assuming this is because some of the dependencies aren't installed?

So here are my questions:

Is there an easier way to statically compile a library? Or do I need to move over its binary file as well as all of its libraries and dependencies?
I'm just trying to split a multi-page PDF into single page files in ruby. Is there a way to do this without PDFTK? Or am I stuck with trying to statically compile PDFTK?

Thanks for the help, I know this isn't an easy problem, but would really appreciate help with this one. I've wasted close to 6 hours trying to get this damn thing to work.

Noshow answered 20/8, 2011 at 1:2 Comment(1)

Have you tried building it using the Heroku vulcan build server? github.com/heroku/vulcan – Holography 23/12, 2011 at 12:2

Unfortunately Heroku keeps stripping out magic to add flexibility. As a result it feels more and more like the days when I used to manage and maintain my own servers. There is no easy solution. My "monkey patch" is to send the file to a server that I can install PDFTK, process the file, and send it back. Not great, but it works. Having to deal with this defeats the purpose of using heroku.

Noshow answered 21/8, 2011 at 19:39 Comment(0)

The easy solution is to add the one dependency for pdftk that is not found on heroku.

$ldd pdftk
    linux-vdso.so.1 =>  (0x00007ffff43ca000)
    libgcj.so.10 => not found
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1d26d48000)
    libm.so.6 => /lib/libm.so.6 (0x00007f1d26ac4000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1d268ad000)
    libc.so.6 => /lib/libc.so.6 (0x00007f1d2652a000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1d2630c000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1d27064000)

I put pdftk and libgcj.so.10 into the /bin directory of my app. You then just need to tell heroku to look at the /bin dir when loading libs.

You can type

$heroku config
LD_LIBRARY_PATH:             /app/.heroku/vendor/lib
LIBRARY_PATH:                /app/.heroku/vendor/lib

To see what your current LD_LIBRARY_PATH is set to and then add /app/bin (or whatever dir you chose to store libgcj.so.10) to it.

$heroku config:set LD_LIBRARY_PATH=/app/.heroku/vendor/lib:/app/bin

The down side is that my slug size went from 15.9MB to 27.5MB

Foretaste answered 14/9, 2012 at 23:59 Comment(0)

We've encountered the same problem, the solution we came up with was to use Stapler instead https://github.com/hellerbarde/stapler, it's a python utility and only requires an extra module to be installed (pyPdf) on Heroku.

I've been oriented to this blog entry: http://theprogrammingbutler.com/blog/archives/2011/07/28/running-pdftotext-on-heroku/

Here are the steps I followed to install pyPdf:

Accessing the heroku bash console

heroku run bash

Installing the latest version of pyPdf

cd tmp
curl http://pybrary.net/pyPdf/pyPdf-1.13.tar.gz -o pyPdf-1.13.tar.gz
tar zxvf pyPdf-1.13.tar.gz
python setup.py install --user

This puts all the necessary files under a .local file at the root of the app. I just downloaded it and added it to our git repo, as well as the stapler utility. Finally I updated my code to use stapler instead of pdftk, et voilà! Splitting PDFs from Heroku again.

Another way, probably cleaner, would be to encapsulate it in a gem ( http://news.ycombinator.com/item?id=2816783 )

Holography answered 22/12, 2011 at 15:24 Comment(0)

I read a similar question on SO, and found this approach by Ryan Daigle that worked for me as well: instead of building local binaries that are hard to match to Heroku's servers, use the remote environment to compile and build the required dependencies. This is accomplished using the Vulcan gem, which is provided by Heroku.

Ryan's article "Building Dependency Binaries for Heroku Applications"

Another approach by Jon Magic (untested by me), is to download and compile the dependency directly through Heroku's bash, e.g. directly on the server: "Compiling Executables on Heroku".

On a side note, both approaches are going to result in binaries that are going to break if Heroku's underlying environment changes enough.

Kowal answered 26/5, 2013 at 8:25 Comment(0)

Try prawn.

Effy answered 20/8, 2011 at 1:15 Comment(2)

I don't think prawn can split an existing PDF file and merge it back together. I think its more for PDF generation. – Noshow 20/8, 2011 at 1:20

@Binary Logic Actually, I think prawn can. Check out the start_new_page method. You can pass in the path to another PDF to use as a "template" and even specify the page number to use. Like so: start_new_page(:template => filename, :template_page => 2) – Hog 31/7, 2012 at 20:41

So here are my questions:

Recommended topics

Hot tags