scrapy Questions

2

Solved

Recently I have started to use Scrapy on a regular basis to analyze sites which demand the latest browser (user agent) for their content to show up. Now, this may seem like an old time problem, yet...
Daubery asked 21/6, 2021 at 10:22

4

Solved

I'm trying to collect a few pieces of information about a bunch of different web sites. I want to produce one Item per site that summarizes the information I found across that site, regardless of w...
Beverlybevers asked 6/4, 2013 at 22:42

10

Solved

I installed Scrapy in my python 2.7 environment in windows 7 but when I trying to start a new Scrapy project using scrapy startproject newProject the command prompt show this massage 'scrapy' is n...
Jojo asked 14/9, 2016 at 9:28

5

Solved

In my previous question, I wasn't very specific over my problem (scraping with an authenticated session with Scrapy), in the hopes of being able to deduce the solution from a more general answer. I...
Wilbourn asked 1/5, 2011 at 20:34

6

After running the scrapy shell with the defined url, I am getting the attribute error showing the following error: AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv3_METHOD' scrapy shell ...
Ineffectual asked 26/9, 2022 at 19:49

5

is there a chance to stop crawling when specific if condition is true (like scrap_item_id == predefine_value ). My problem is similar to Scrapy - how to identify already scraped urls but I want to ...
Dyestuff asked 15/12, 2010 at 10:5

4

Is it possible to delay the retry of a particular scrapy Request. I have a middleware which needs to defer the request of a page until a later time. I know how to do the basic deferal (end of queue...
Milquetoast asked 2/10, 2013 at 11:29

4

Solved

What are the steps to upload the crawled data from Scrapy to the Amazon s3 as a csv/jsonl/json file? All i could find from the internet was to upload scraped images to the s3 bucket. I'm currently...
Hettiehetty asked 5/8, 2016 at 11:24

5

Solved

At dawn my code was working perfectly, but today when I woke up it is no longer working, and I didn't change any line of code, I also checked if Firefox updated, and no, it didn't, and I have no id...
Impact asked 15/4, 2022 at 15:26

8

Solved

I have Visual Studio Code on a Windows Machine, on which I am making a new Scrapy Crawler. The crawler is working fine but I want to debug the code, for which I am adding this in my launch.json fil...

6

Solved

I am scraping a soccer site and the spider (a single spider) gets several kinds of items from the site's pages: Team, Match, Club etc. I am trying to use the CSVItemExporter to store these items in...
Bramante asked 1/9, 2012 at 18:34

11

I get twisted.internet.error.ReactorNotRestartable error when I execute following code: from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.util...
Hickory asked 9/10, 2016 at 17:47

27

Solved

I'm practicing the code from 'Web Scraping with Python', and I keep having this certificate problem: from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set...
Popular asked 8/5, 2018 at 14:32

2

Solved

I need the values in a dict. But item uses some abstraction on top of it. How to get the fields in a dict from an item ? I know scrapy allows dict to be returned in place of item now. But I alread...
Upbraid asked 6/8, 2015 at 12:20

4

I am a newbie to python. I am running python 2.7.3 version 32 bit on 64 bit OS. (I tried 64 bit but it didn't workout). I followed the tutorial and installed scrapy on my machine. I have created o...
Salado asked 12/4, 2012 at 11:58

7

I just started programming Python. I want to use scrapy to create a bot,and it showed TypeError: Object of type 'bytes' is not JSON serializable when I run the project. import json import codecs...
Spam asked 21/6, 2017 at 16:54

24

Solved

This is Windows 7 with python 2.7 I have a scrapy project in a directory called caps (this is where scrapy.cfg is) My spider is located in caps\caps\spiders\campSpider.py I cd into the scrapy pr...
Charette asked 26/3, 2012 at 17:27

25

Solved

I want to install Lxml so I can then install Scrapy. When I updated my Mac today it wouldn't let me reinstall lxml, I get the following error: In file included from src/lxml/lxml.etree.c:314: /priv...
Smelt asked 23/10, 2013 at 17:7

2

I am trying to crawl data from a list of URLs. I have already done with the code below and succeeded yesterday without any error. But today, when I came back and ran the code again, there was an er...
Befriend asked 28/8, 2023 at 19:32

1

Solved

I'm learning python scraping with scrapy. I did exacly the same thing as the tutorial teaches. But I got an error. Please help! My Python code: import scrapy class BookSpider(scrapy.Spider): nam...
Sibbie asked 29/8, 2023 at 18:38

5

Solved

This is not working anymore, scrapy's API has changed. Now the documentation feature a way to "Run Scrapy from a script" but I get the ReactorNotRestartable error. My task: from celery import Ta...
Counteraccusation asked 1/3, 2014 at 15:46

4

Solved

I'm building a data extract using scrapy and want to normalize a raw string pulled out of an HTML document. Here's an example string: Sapphire RX460 OC 2/4GB Notice two groups of two whitespace...
Deuteron asked 30/9, 2017 at 9:4

3

Solved

I'm trying to create a custom Scrapy Item Exporter based off JsonLinesItemExporter so I can slightly alter the structure it produces. I have read the documentation here http://doc.scrapy.org/en/la...
Higbee asked 22/10, 2015 at 21:15

5

I have disabled the Default Scrapy cookie option, so that i have to set it manually. COOKIES_ENABLED = False COOKIES_DEBUG = True Now, i need to set cookie with the value which is received as th...
Paleo asked 6/4, 2016 at 6:32

7

I'm new to scrapy, and recently started using it on the M1 MacBook Air. I've encountered an issue. For example, when I try to do something like this: scrapy shell bbc.com It would return me: Memor...
Bazemore asked 16/5, 2021 at 12:46

© 2022 - 2025 — McMap. All rights reserved.