How to set splash timeout in scrapy-splash?
Asked Answered
S

1

10

I use scrapy-splash to crawl web page, and run splash service on docker.

commond:

docker run -p 8050:8050 scrapinghub/splash --max-timeout 3600

But I got a 504 error.

"error": {"info": {"timeout": 30}, "description": "Timeout exceeded rendering page", "error": 504, "type": "GlobalTimeoutError"}

Although I try to add splash.resource_timeout, request:set_timeout or SPLASH_URL = 'http://localhost:8050?timeout=1800.0', nothing changed.

Thanks for help.

Stephaniestephannie answered 19/6, 2017 at 10:8 Comment(0)
H
12

I use scrapy-splash package and set the timeout in args parameter of SplashRequest like this:

yield scrapy_splash.SplashRequest(
    url, self.parse, endpoint='execute',
    args={'lua_source': script, 'timeout': 3600})

It works for me.

Hexapla answered 19/6, 2017 at 10:55 Comment(3)
I got error 400 from this setting. I don't know whyAbsorbefacient
This does not really explain why its timing out though. Setting the timeout to 3600 is just hiding the errors... Something is still going seriously wrong if it takes 3600 seconds to execute.Pisci
Ahh, I got error 400 if I set the timeout more than the max timeout set when running the Splash server. The default max timeout is 90Absorbefacient

© 2022 - 2024 — McMap. All rights reserved.