scraper Questions
3
I am developing a Node.js app, and I use Selenium Webdriver on it for scraping purposes. However, when I deploy on Heroku, Selenium doesn't work. How can I make Selenium work on Heroku?
6
Solved
Can somebody distinguish between a crawler and scraper in terms of scope and functionality.
Fungous asked 8/7, 2010 at 19:56
2
Solved
Following: scrapy's tutorial i made a simple image crawler (scrapes images of Bugattis). Which is illustrated below in EXAMPLE.
However, following the guide has left me with a non functioning craw...
5
Solved
I'm trying to 'defrontpagify' the html of a MS FrontPage generated website, and I'm writing a BeautifulSoup script to do it.
However, I've gotten stuck on the part where I try to strip a particul...
Continent asked 28/1, 2012 at 9:3
2
Solved
I am just getting started with JS and Node.js. I am trying to build a simple scraper as first project, using Node.js and some modules such as request and cheerio.
I would like to add a 5 secs delay...
3
Many times when crawling we run into problems where content that is rendered on the page is generated with Javascript and therefore scrapy is unable to crawl for it (eg. ajax requests, jQuery)
Exaggerate asked 17/4, 2016 at 7:4
3
Solved
I have following HTML Structure: I am trying to build a robust method to extract second color digest element since there will be many of these tag within the DOM.
<table>
<tbody>
<...
2
Solved
My problem is that it doesn't just require a basic cookie, but rather asks for a session cookie, and for randomly generated IDs. I think this means I need to use a web browser emulator with a cooki...
1
Solved
How can I keep looking for elements in a #document:
<div>
<iframe>
#document
<html>
<body>
<div>
Element I want to find
</div>
</body>
</html...
3
Solved
First of all, I think it's worth saying that, I know there are a bunch of similar questions but NONE of them works for me...
I'm a newbie on Python, html and web scraper. I'm trying to scrape user...
Silent asked 18/11, 2013 at 3:35
3
Solved
I have written many scrapers but I am not really sure how to handle infinite scrollers. These days most website etc, Facebook, Pinterest has infinite scrollers.
Wardmote asked 20/9, 2012 at 18:56
1
I recently moved one of my sites (gezondbenjij.nl) to a new hosting account. This resulted in a new IP address.
Unfortunately, since the move, the Facebook scraper cannot find my site on the new I...
Berga asked 18/4, 2014 at 15:7
1
Solved
I have a big HTML page. But I want to select certain nodes using Xpath:
<html>
........
<!-- begin content -->
<div>some text</div>
<div><p>Some more element...
2
Solved
I'm using scrapy to extract data from a web site, but I have a problem with the XPath selector, assuming i have this HTML code:
<div id="_parent">
Hi!
<p>I am a child!</p>
<...
5
My website is multi-language and I have a FB like button. I'd like to have the like posts in different languages.
According to Facebook documentation, if I use the meta tag og:locale and og:locale...
Accoutre asked 30/9, 2011 at 18:34
2
Solved
I am creating the HTML meta-tags dynamically using the function below (GWT). It takes 1 second to have this on the DOM. It is working fine except for Facebook. When I share a link from my web, the ...
Peppard asked 15/2, 2013 at 16:8
5
I need to know how to create a scraper (in Java) to gather data from HTML pages and output to a database...do not have a clue where to start so any information you can give me on this would b...
7
I have a scraper which scrape one site (Written in python). While scraping the site, that print lines which are about to write in CSV. Scraper has been written in Python and now I want to execute i...
2
Solved
I am new to XPath and it seems a bit tricky to me; Sometimes I find it is not working the way I am thinking it should work.
When I scrape data from a website using XPath and Nokogiri, I find...
1
I am trying to use Scrapy to login to a website in the init then after confirming login I want to initialize and start the standard crawl through start_urls. Im not sure what is going wrong but i g...
Peipeiffer asked 30/6, 2012 at 6:13
1
So I've read through the Crawling with an authenticated session in Scrapy and I am getting hung up, I am 99% sure that my parse code is correct, I just don't believe the login is redirecting and be...
Minesweeper asked 8/6, 2012 at 18:16
2
Solved
We want to setup a little honeypot image in our html bodies to detect scrapers / bad bots.
Has anyone set something like this up before?
We were thinking the best way to go at it would be to:
a)...
2
Solved
I am trying to scrape http://www.nscb.gov.ph/ggi/database.asp, specifically all the tables you get from selecting the municipalities/provinces. I am using python with lxml.html and mechanize. my sc...
2
Solved
I have spider that I have written using the Scrapy framework. I am having some trouble getting any pipelines to work. I have the following code in my pipelines.py:
class FilePipeline(object):
de...
Rayerayfield asked 3/11, 2010 at 19:21
1
© 2022 - 2024 — McMap. All rights reserved.