Straightforward Build Instructions for Poppler on Lambda using Docker
In order to put Poppler on Lambda, we will build a zipped folder containing poppler and add it as a layer. Follow these steps on an EC2 instance running Amazon Linux 2 (t2micro is plenty).
- Setup the machine
Install docker on the EC2 machine. Instructions here
mkdir -p poppler_binaries
- Create a Dockerfile
Use this link or copy/paste from below.
FROM ubuntu:18.04
# Installing dependencies
RUN apt update
RUN apt-get update
RUN apt-get install -y locate \
libopenjp2-7 \
poppler-utils
RUN rm -rf /poppler_binaries; mkdir /poppler_binaries;
RUN updatedb
RUN cp $(locate libpoppler.so) /poppler_binaries/.
RUN cp $(which pdftoppm) /poppler_binaries/.
RUN cp $(which pdfinfo) /poppler_binaries/.
RUN cp $(which pdftocairo) /poppler_binaries/.
RUN cp $(locate libjpeg.so.8 ) /poppler_binaries/.
RUN cp $(locate libopenjp2.so.7 ) /poppler_binaries/.
RUN cp $(locate libpng16.so.16 ) /poppler_binaries/.
RUN cp $(locate libz.so.1 ) /poppler_binaries/.
- Build Docker Image and create a zip file
Running the commands below will produce a zip file in your home directory.
docker build -t poppler-build .
# Run the container
docker run -d --name poppler-build-cont poppler-build sleep 20
#docker exec poppler-build-cont
sudo docker cp poppler-build-cont:/poppler_binaries .
# Cleaning up
docker kill poppler-build-cont
docker rm poppler-build-cont
docker image rm poppler-build
cd poppler_binaries
zip -r9 ..poppler.zip .
cd ..
- Make and add your Lambda Layer
Download your zip file or upload it to S3. Head to the Lambda Console page to create a Layer and then add it to your function. Information about layers here.
- Add Environment Variable to Lambda
In order to avoid adding unnecessary folder structure to the zip as described here. We will add an environment variable to point to our dependency
PYTHONPATH: /opt/
And Viola! You now have a working Lambda function with Poppler!
Note: Credit to these two articles which helped me piece this together
Warning: do not try to add pdf2image to the same layer. I am not sure why but when they are in the same layer, pdf2image cannot find poppler.