scrapy - McMap

2

Solved

Getting latest chrome user agent for Scrapy in python or other wise

Recently I have started to use Scrapy on a regular basis to analyze sites which demand the latest browser (user agent) for their content to show up. Now, this may seem like an old time problem, yet...

scrapy user-agent

Daubery asked 21/6, 2021 at 10:22

4

Solved

How can I group data scraped from multiple pages, using Scrapy, into one Item?

I'm trying to collect a few pieces of information about a bunch of different web sites. I want to produce one Item per site that summarizes the information I found across that site, regardless of w...

python scrapy

Beverlybevers asked 6/4, 2013 at 22:42

10

Solved

Scrapy installed, but won't recognized in the command line

I installed Scrapy in my python 2.7 environment in windows 7 but when I trying to start a new Scrapy project using scrapy startproject newProject the command prompt show this massage 'scrapy' is n...

python python-2.7 scrapy

Jojo asked 14/9, 2016 at 9:28

5

Solved

Crawling with an authenticated session in Scrapy

In my previous question, I wasn't very specific over my problem (scraping with an authenticated session with Scrapy), in the hopes of being able to deduce the solution from a more general answer. I...

python scrapy

Wilbourn asked 1/5, 2011 at 20:34

6

AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv3_METHOD'

After running the scrapy shell with the defined url, I am getting the attribute error showing the following error: AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv3_METHOD' scrapy shell ...

python python-3.x web-scraping scrapy

Ineffectual asked 26/9, 2022 at 19:49

5

Force my scrapy spider to stop crawling

is there a chance to stop crawling when specific if condition is true (like scrap_item_id == predefine_value ). My problem is similar to Scrapy - how to identify already scraped urls but I want to ...

python scrapy

Dyestuff asked 15/12, 2010 at 10:5

4

Add a delay to a specific scrapy Request

Is it possible to delay the retry of a particular scrapy Request. I have a middleware which needs to defer the request of a page until a later time. I know how to do the basic deferal (end of queue...

python scrapy

Milquetoast asked 2/10, 2013 at 11:29

4

Solved

How to upload crawled data from Scrapy to Amazon S3 as csv or json?

What are the steps to upload the crawled data from Scrapy to the Amazon s3 as a csv/jsonl/json file? All i could find from the internet was to upload scraped images to the s3 bucket. I'm currently...

python json amazon-s3 web-scraping scrapy

Hettiehetty asked 5/8, 2016 at 11:24

5

Solved

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=59587): Max retries exceeded with url using Selenium GeckoDriver Firefox

At dawn my code was working perfectly, but today when I woke up it is no longer working, and I didn't change any line of code, I also checked if Firefox updated, and no, it didn't, and I have no id...

python selenium scrapy geckodriver firefox-headless

Impact asked 15/4, 2022 at 15:26

8

Solved

Debugging Scrapy Project in Visual Studio Code

I have Visual Studio Code on a Windows Machine, on which I am making a new Scrapy Crawler. The crawler is working fine but I want to debug the code, for which I am adding this in my launch.json fil...

python python-3.x visual-studio scrapy visual-studio-code

Nieman asked 9/3, 2018 at 20:47

6

Solved

How can scrapy export items to separate csv files per item

I am scraping a soccer site and the spider (a single spider) gets several kinds of items from the site's pages: Team, Match, Club etc. I am trying to use the CSVItemExporter to store these items in...

csv scrapy exporter

Bramante asked 1/9, 2012 at 18:34

11

ReactorNotRestartable error in while loop with scrapy

I get twisted.internet.error.ReactorNotRestartable error when I execute following code: from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.util...

python python-2.7 scrapy twisted

Hickory asked 9/10, 2016 at 17:47

27

Solved

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org [duplicate]

I'm practicing the code from 'Web Scraping with Python', and I keep having this certificate problem: from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set...

python web-scraping beautifulsoup scrapy ssl-certificate

Popular asked 8/5, 2018 at 14:32

2

Solved

How to get a dict from scrapy item?

I need the values in a dict. But item uses some abstraction on top of it. How to get the fields in a dict from an item ? I know scrapy allows dict to be returned in place of item now. But I alread...

python dictionary scrapy

Upbraid asked 6/8, 2015 at 12:20

4

unknown command: crawl error

I am a newbie to python. I am running python 2.7.3 version 32 bit on 64 bit OS. (I tried 64 bit but it didn't workout). I followed the tutorial and installed scrapy on my machine. I have created o...

python scrapy web-crawler

Salado asked 12/4, 2012 at 11:58

7

TypeError: Object of type 'bytes' is not JSON serializable

I just started programming Python. I want to use scrapy to create a bot，and it showed TypeError: Object of type 'bytes' is not JSON serializable when I run the project. import json import codecs...

python json serialization scrapy

Spam asked 21/6, 2017 at 16:54

24

Solved

Scrapy spider not found error

This is Windows 7 with python 2.7 I have a scrapy project in a directory called caps (this is where scrapy.cfg is) My spider is located in caps\caps\spiders\campSpider.py I cd into the scrapy pr...

python scrapy

Charette asked 26/3, 2012 at 17:27

25

Solved

Cannot install Lxml on Mac OS X 10.9

I want to install Lxml so I can then install Scrapy. When I updated my Mac today it wouldn't let me reinstall lxml, I get the following error: In file included from src/lxml/lxml.etree.c:314: /priv...

python xcode macos scrapy lxml

Smelt asked 23/10, 2013 at 17:7

2

Error when crawl data: 'EPollReactor' object has no attribute '_handleSignals'

I am trying to crawl data from a list of URLs. I have already done with the code below and succeeded yesterday without any error. But today, when I came back and ran the code again, there was an er...

scrapy web-crawler

Befriend asked 28/8, 2023 at 19:32

1

Solved

I'm learning python web scraping . It shows AttributeError when i scrapy crawl a spider

I'm learning python scraping with scrapy. I did exacly the same thing as the tutorial teaches. But I got an error. Please help! My Python code: import scrapy class BookSpider(scrapy.Spider): nam...

python scrapy python-asyncio twisted

Sibbie asked 29/8, 2023 at 18:38

5

Solved

Run a Scrapy spider in a Celery Task

This is not working anymore, scrapy's API has changed. Now the documentation feature a way to "Run Scrapy from a script" but I get the ReactorNotRestartable error. My task: from celery import Ta...

scrapy twisted celery

Counteraccusation asked 1/3, 2014 at 15:46

4

Solved

Normalize whitespace with Python

I'm building a data extract using scrapy and want to normalize a raw string pulled out of an HTML document. Here's an example string: Sapphire RX460 OC 2/4GB Notice two groups of two whitespace...

python string xpath scrapy

Deuteron asked 30/9, 2017 at 9:4

3

Solved

How to create custom Scrapy Item Exporter?

I'm trying to create a custom Scrapy Item Exporter based off JsonLinesItemExporter so I can slightly alter the structure it produces. I have read the documentation here http://doc.scrapy.org/en/la...

python json scrapy

Higbee asked 22/10, 2015 at 21:15

5

How to get cookie from scrapy response and set the cookie to the next request?

I have disabled the Default Scrapy cookie option, so that i have to set it manually. COOKIES_ENABLED = False COOKIES_DEBUG = True Now, i need to set cookie with the value which is received as th...

python http-redirect cookies scrapy

Paleo asked 6/4, 2016 at 6:32

7

Scrapy on M1 Mac: MemoryError: Cannot allocate write+execute memory for ffi.callback()

I'm new to scrapy, and recently started using it on the M1 MacBook Air. I've encountered an issue. For example, when I try to do something like this: scrapy shell bbc.com It would return me: Memor...

scrapy

Bazemore asked 16/5, 2021 at 12:46

scrapy Questions

Recommended topics

Hot tags