web-crawler - McMap

6

Solved

how to tell if a web request is coming from google's crawler?

From the HTTP server's perspective.

Tartu asked 22/7, 2010 at 12:6

4

Every hour and a half Im getting a flood of requests from http://www.facebook.com/externalhit_uatext.php. I know what theses requests should mean, but this behavior is very odd. On a regular bas...

facebook web-crawler

Grouch asked 19/3, 2012 at 16:27

3

Solved

How to exclude part of a web page from google's indexing?

There's a way of excluding complete page(s) from google's indexing. But is there a way to specifically exclude certain part(s) of a web page from google's crawling? For example, exclude the side-ba...

web-crawler google-search-console

Ailanthus asked 5/1, 2010 at 7:39

2

How to prevent google from indexing <script type="application/json"> content

I have discovered through Google's webmaster tools that google is crawling paths that look like links embedded in json in a <script type="application/json"> tag. This json is later parsed and...

javascript json web-crawler

Asti asked 9/11, 2017 at 20:3

10

Solved

Facebook crawler is hitting my server hard and ignoring directives. Accessing same resources multiple times

The Facebook Crawler is hitting my servers multiple times every second and it seems to be ignoring both the Expires header and the og:ttl property. In some cases, it is accessing the same og:image...

php facebook facebook-graph-api web-crawler

Tsarevna asked 30/3, 2018 at 16:2

4

How to detect web crawlers for SEO, using Express?

I've been searching for npm packages but they all seem unmaintained and rely on the outdated user-agent databases. Is there a reliable and up-to-date package out there that helps me detect crawlers...

npm web-crawler user-agent

Deficiency asked 7/1, 2016 at 4:57

3

Solved

How to view aggregated liquidations for cryptocurrencies from Binance?

On these sites (https://coinalyze.net/ethereum-classic/liquidations/, BTC/USDT), I am able to add following indications into grpah [Liquidations, Long Liquidations, Short Liquidations, Aggregated L...

web-crawler pine-script binance tradingview-api

Trait asked 12/5, 2021 at 19:27

6

Solved

Python Web Crawlers and "getting" html source code

So my brother wanted me to write a web crawler in Python (self-taught) and I know C++, Java, and a bit of html. I'm using version 2.7 and reading the python library, but I have a few problems 1. ht...

python get web-crawler

Rhaetian asked 20/8, 2010 at 17:54

3

How to make a polygon radar (spider) chart in python

import matplotlib.pyplot as plt import numpy as np labels=['Siege', 'Initiation', 'Crowd_control', 'Wave_clear', 'Objective_damage'] markers = [0, 1, 2, 3, 4, 5] str_markers = ["0", &quo...

python matplotlib charts web-crawler

Lifeless asked 20/10, 2018 at 21:17

3

Solved

Reverse search an image in Yandex Images using Python

I'm interested in automatizing reverse image search. Yandex in particular is great for busting catfishes, even better than Google Images. So, consider this Python code: import requests import webb...

python parsing web-crawler yandex

Purulence asked 23/5, 2020 at 20:16

4

Solved

Sending "User-agent" using Requests library in Python

I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is if it is okay to send this as a part of the header, as in the code below: debug = {'verbo...

python web-crawler python-requests

Garget asked 15/5, 2012 at 17:48

2

Solved

Why is the DNS Resolver necessary in a crawler architecture?

In every paper I have read about crawler proposals, I see that one important component is the DNS Resolver. My question is: Why is it necessary? Can't we just make a request to http://www.some-do...

dns network-programming web-crawler

Hannibal asked 28/10, 2012 at 5:12

7

python: [Errno 10054] An existing connection was forcibly closed by the remote host

I am writing python to crawl Twitter space using Twitter-py. I have set the crawler to sleep for a while (2 seconds) between each request to api.twitter.com. However, after some times of running (a...

python twitter web-crawler

Academic asked 11/1, 2012 at 5:54

4

Solved

Python-Requests (>= 1.*): How to disable keep-alive?

I'm trying to program a simple web-crawler using the Requests module, and I would like to know how to disable its -default- keep-alive feauture. I tried using: s = requests.session() s.config['ke...

python web web-crawler python-requests

Karl asked 8/1, 2014 at 23:42

4

unknown command: crawl error

I am a newbie to python. I am running python 2.7.3 version 32 bit on 64 bit OS. (I tried 64 bit but it didn't workout). I followed the tutorial and installed scrapy on my machine. I have created o...

python scrapy web-crawler

Salado asked 12/4, 2012 at 11:58

6

Solved

How can I scrape pages with dynamic content using node.js?

I am trying to scrape a website but I don't get some of the elements, because these elements are dynamically created. I use the cheerio in node.js and My code is below. var request = require('req...

javascript node.js web-crawler phantomjs

Crave asked 26/2, 2015 at 9:49

4

crawl links of sitemap.xml through wget command

I try to crawl all links of a sitemap.xml to re-cache a website. But the recursive option of wget does not work, I only get as respond: Remote file exists but does not contain any link -- not re...

wget web-crawler sitemap.xml

Tachylyte asked 27/6, 2013 at 3:37

6

Solved

Protecting email addresses from spam bots / web crawlers

How do you prevent emails being gathered from web pages by email spiders? Does mailto: linking them increase the likelihood of them being picked up? Is URL-encoding useful? Obviously the best coun...

web-crawler spam spam-prevention email-spam

Palace asked 8/9, 2010 at 1:17

2

Solved

Is there CURRENTLY anyway to fetch Instagram user media without authentication?

Until recently there were several ways to retrieve Instagram user media without the need for API authentication. But apparently, the website stopped all of them. Some of the old methods: https:/...

web-crawler instagram

Cantoris asked 16/4, 2018 at 7:49

2

Error when crawl data: 'EPollReactor' object has no attribute '_handleSignals'

I am trying to crawl data from a list of URLs. I have already done with the code below and succeeded yesterday without any error. But today, when I came back and ran the code again, there was an er...

scrapy web-crawler

Befriend asked 28/8, 2023 at 19:32

4

Python: Disable images in Selenium Google ChromeDriver

I spend a lot of time searching about this. At the end of the day I combined a number of answers and it works. I share my answer and I'll appreciate it if anyone edits it or provides us with an eas...

python google-chrome selenium web-scraping web-crawler

Peculiar asked 21/1, 2015 at 15:1

5

Solved

PyPi download counts seem unrealistic

I put a package on PyPi for the first time ~2 months ago, and have made some version updates since then. I noticed this week the download count recording, and was surprised to see it had been downl...

python web-crawler pypi

Thynne asked 10/3, 2012 at 16:23

4

How to detect if a site lets you upload files?

I would like to be able to tell if a site lets you upload files. I can think of two main ways sites do it and ideally I'd like to be able to detect both: Button Drag & Drop PhantomJS document...

javascript selenium web-scraping file-upload web-crawler

Lanellelanette asked 16/12, 2021 at 12:10

3

Solved

Get proxy ip address scrapy using to crawl

I use Tor to crawl web pages. I started tor and polipo service and added class ProxyMiddleware(object): # overwrite process request def process_request(self, request, spider): # Set the locatio...

python proxy web-scraping scrapy web-crawler

Ticon asked 8/12, 2014 at 18:38

4

Solved

Passing arguments to process.crawl in Scrapy python

I would like to get the same result as this command line : scrapy crawl linkedin_anonymous -a first=James -a last=Bond -o output.json My script is as follows : import scrapy from linkedin_anonymo...

python web-crawler scrapy google-crawlers

Massage asked 20/12, 2015 at 15:6

web-crawler Questions

Recommended topics

Hot tags