How can I use Scrapy-Splash without Docker?
Asked Answered
K

1

7

Is a way to use scrapy splash without docker. I mean, I have a server running with python3 without docker installed. And If possible I don't want to install docker on it.

Also, what does exactly SPLASH_URL. Can I use only the IP of my server ?

I already tried something :

    def start_requests(self):
        url = ["europages.fr/entreprises/France/pg-20/resultats.html?ih=01510;01505;01515;01525;01530;01570;01565;01750;01590;01595;01575;01900;01920;01520;01905;01585;01685;01526;01607;01532;01580;01915;02731;01700;01600;01597;01910;01906"]
        print(url)
        yield SplashRequest(url = 'https://' + url[0], callback = self.parse_all_links,
            args={
                # optional; parameters passed to Splash HTTP API
                'wait': 0.5,

                # 'url' is prefilled from request url
                # 'http_method' is set to 'POST' for POST requests
                # 'body' is set to request body for POST requests
            } # optional; default is render.html
        ) ## TO DO : Changer la callback

with setting.py

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
DOWNLOADER_MIDDLEWARES = {
    #'Europages.middlewares.EuropagesDownloaderMiddleware': 543,
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

AND

SPLASH_URL = "url_of_my_server"

I hope my post is clear.

Thanks Regards,

Kensell answered 26/7, 2019 at 8:57 Comment(4)
are you using crawlera?Horrified
Hum, no why ? @HorrifiedKensell
Do you get any error?Schnapps
Any update about this question ? I am in the exact same situation, i.e. I don't want to install docker on the server I useSnuggle
S
0

It seems that it used to be possible in previous versions of Splash but not anymore (https://splash.readthedocs.io/en/3.3.1/install.html)

Snuggle answered 13/1, 2023 at 13:34 Comment(1)
splash.readthedocs.io/en/3.3.1/…?Huddersfield

© 2022 - 2024 — McMap. All rights reserved.