If you have Scrapyd installed on Ubuntu server, I'd put this command at the end of /etc/rc.local
file:
<path_to_scrapyd_binary>/scrapyd > /dev/null 2>&1 &
where <path_to_scrapyd_binary>
is probably going to be something like /usr/local/bin
. /etc/rc.local
is best suited for such cases when you don't want to build you own service file or startup script. There was also suggested putting the command in Cron table with @reboot
, but this sometimes didn't work for me for some reason (though, I didn't examine those reasons in depth).
Still my preferred option now is to deploy Scrapyd in Docker. You can get Scrapyd image from Docker Hub. Or you can build the image yourself if you have specific needs. I chose the second option. First I deployed my own Docker repository for that purpose. Once done, I built my own Scrapyd image using this Dockerfile
:
FROM ubuntu:16.04
RUN apt-get update -q \
&& apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
curl \
libffi-dev \
libjpeg-turbo8 \
liblcms2-2 \
libssl-dev \
libtiff5 \
libtool \
libwebp5 \
python \
python-dev \
zlib1g \
&& curl -sSL https://bootstrap.pypa.io/get-pip.py | python \
&& pip install --no-cache-dir \
docker \
future \
geocoder \
influxdb \
Pillow \
pymongo \
scrapy-fake-useragent \
scrapy_splash \
scrapyd \
selenium \
unicode-slugify \
&& apt-get purge -y --auto-remove \
build-essential \
curl \
libffi-dev \
libssl-dev \
libtool \
python-dev \
&& rm -rf /var/lib/apt/lists/*
COPY ./scrapyd.conf /etc/scrapyd/
VOLUME /etc/scrapyd /var/lib/scrapyd
EXPOSE 6800
CMD ["scrapyd", "--logfile=/var/log/scrapyd.log", "--pidfile="]
After building the image and pushing it into the registry, I can deploy it to as many worker servers I need (or, of course, locally). Once you have the image pulled (either the one from Docker Hub, or your own), you can start it using:
sudo docker run --name=scrapyd -d -p 6800:6800 --restart=always -v /var/lib/scrapyd:/var/lib/scrapyd --add-host="dockerhost:"`ip addr show docker0 | grep -Po 'inet \K[\d.]+'` <location>/scrapyd
where <location>
is either Docker Hub account, or it points to your own registry. This rather complicated command starts Scrapyd image in the background (-d
option) listening on port 6800 every time Docker service is (re-)started (--restart=always
option). It also publishes your hosts IP address as dockerhost
to the container for cases where you need to access other (probably Dockerized) services on the host.