scraper - McMap

3

How to use Selenium Webdriver on Heroku?

I am developing a Node.js app, and I use Selenium Webdriver on it for scraping purposes. However, when I deploy on Heroku, Selenium doesn't work. How can I make Selenium work on Heroku?

node.js selenium heroku webdriver scraper

Dictograph asked 17/3, 2017 at 14:56

6

Solved

crawler vs scraper [closed]

Can somebody distinguish between a crawler and scraper in terms of scope and functionality.

web-crawler terminology scraper

Fungous asked 8/7, 2010 at 19:56

2

Solved

Scrapy: Images Pipeline, download images

Following: scrapy's tutorial i made a simple image crawler (scrapes images of Bugattis). Which is illustrated below in EXAMPLE. However, following the guide has left me with a non functioning craw...

python scrapy scraper

Anfractuosity asked 26/7, 2016 at 11:53

5

Solved

BeautifulSoup: Strip specified attributes, but preserve the tag and its contents

I'm trying to 'defrontpagify' the html of a MS FrontPage generated website, and I'm writing a BeautifulSoup script to do it. However, I've gotten stuck on the part where I try to strip a particul...

python web-scraping beautifulsoup scraper frontpage

Continent asked 28/1, 2012 at 9:3

2

Solved

delay in a for loop for http request

I am just getting started with JS and Node.js. I am trying to build a simple scraper as first project, using Node.js and some modules such as request and cheerio. I would like to add a 5 secs delay...

node.js loops url scraper

Cordate asked 10/3, 2017 at 11:34

3

How to crawl with php Goutte and Guzzle if data is loaded by Javascript?

Many times when crawling we run into problems where content that is rendered on the page is generated with Javascript and therefore scrapy is unable to crawl for it (eg. ajax requests, jQuery)

php web-crawler guzzle scraper goutte

Exaggerate asked 17/4, 2016 at 7:4

3

Solved

XPath:: Get following Sibling

I have following HTML Structure: I am trying to build a robust method to extract second color digest element since there will be many of these tag within the DOM. <table> <tbody> &lt...

html xpath siblings scraper

Sir asked 25/7, 2012 at 19:33

2

Solved

How can I scrape website content in PHP from a website that requires a cookie login?

My problem is that it doesn't just require a basic cookie, but rather asks for a session cookie, and for randomly generated IDs. I think this means I need to use a web browser emulator with a cooki...

php cookies scraper snoopy goutte

Authors asked 3/11, 2012 at 14:38

1

Solved

Python selenium get inside a #document

How can I keep looking for elements in a #document: <div> <iframe> #document <html> <body> <div> Element I want to find </div> </body> </html...

python selenium iframe scraper

Leinster asked 14/7, 2016 at 0:15

3

Solved

How to scrape a website that requires login first with Python

First of all, I think it's worth saying that, I know there are a bunch of similar questions but NONE of them works for me... I'm a newbie on Python, html and web scraper. I'm trying to scrape user...

python http cookies authorization scraper

Silent asked 18/11, 2013 at 3:35

3

Solved

scrape websites with infinite scrolling

I have written many scrapers but I am not really sure how to handle infinite scrollers. These days most website etc, Facebook, Pinterest has infinite scrollers.

python screen-scraping scraper

Wardmote asked 20/9, 2012 at 18:56

1

Facebook scraper uses incorrect DNS data > my site is not gettng scraped

I recently moved one of my sites (gezondbenjij.nl) to a new hosting account. This resulted in a new IP address. Unfortunately, since the move, the Facebook scraper cannot find my site on the new I...

facebook facebook-graph-api dns ip scraper

Berga asked 18/4, 2014 at 15:7

1

Solved

XPath to select between two HTML comments?

I have a big HTML page. But I want to select certain nodes using Xpath: <html> ........  <div>some text</div> <div><p>Some more element...

html ruby xpath nokogiri scraper

Silda asked 18/9, 2013 at 11:56

2

Solved

XPath recursive children selection

I'm using scrapy to extract data from a web site, but I have a problem with the XPath selector, assuming i have this HTML code: <div id="_parent"> Hi! <p>I am a child!</p> &lt...

html xpath scrapy scraper

Recalcitrate asked 17/9, 2013 at 21:24

5

Facebook meta tags scraped with locale not working

My website is multi-language and I have a FB like button. I'd like to have the like posts in different languages. According to Facebook documentation, if I use the meta tag og:locale and og:locale...

facebook facebook-like locale scraper

Accoutre asked 30/9, 2011 at 18:34

2

Solved

Facebook scraper doesn't load dynamic meta-tags

I am creating the HTML meta-tags dynamically using the function below (GWT). It takes 1 second to have this on the DOM. It is working fine except for Facebook. When I share a link from my web, the ...

html facebook web-scraping meta-tags scraper

Peppard asked 15/2, 2013 at 16:8

5

Scrape data from HTML pages using Java, output to database [closed]

I need to know how to create a scraper (in Java) to gather data from HTML pages and output to a database...do not have a clue where to start so any information you can give me on this would b...

java scraper

Melar asked 18/3, 2010 at 15:29

7

Print Python output by PHP Code

I have a scraper which scrape one site (Written in python). While scraping the site, that print lines which are about to write in CSV. Scraper has been written in Python and now I want to execute i...

php python scraper

Falsity asked 9/12, 2012 at 11:19

2

Solved

XPath along with nokogiri; tutorials/examples? [closed]

I am new to XPath and it seems a bit tricky to me; Sometimes I find it is not working the way I am thinking it should work. When I scrape data from a website using XPath and Nokogiri, I find...

xpath nokogiri scraper

Ezzell asked 25/10, 2012 at 14:5

1

Scrapy InIt self.initialized() -- not initializing

I am trying to use Scrapy to login to a website in the init then after confirming login I want to initialize and start the standard crawl through start_urls. Im not sure what is going wrong but i g...

python selenium scrapy web-crawler scraper

Peipeiffer asked 30/6, 2012 at 6:13

1

Crawling LinkedIn while authenticated with Scrapy

So I've read through the Crawling with an authenticated session in Scrapy and I am getting hung up, I am 99% sure that my parse code is correct, I just don't believe the login is redirecting and be...

python linkedin-api scrapy scraper

Minesweeper asked 8/6, 2012 at 18:16

2

Solved

Advice for use of honeypot img tag to detect scrapers / bad bots

We want to setup a little honeypot image in our html bodies to detect scrapers / bad bots. Has anyone set something like this up before? We were thinking the best way to go at it would be to: a)...

html image detect scraper honeypot

Robynroc asked 7/9, 2011 at 20:24

2

Solved

mechanize submit form character encoding problem

I am trying to scrape http://www.nscb.gov.ph/ggi/database.asp, specifically all the tables you get from selecting the municipalities/provinces. I am using python with lxml.html and mechanize. my sc...

python encoding mechanize scraper

Minda asked 7/7, 2011 at 11:57

2

Solved

Can't get Scrapy pipeline to work

I have spider that I have written using the Scrapy framework. I am having some trouble getting any pipelines to work. I have the following code in my pipelines.py: class FilePipeline(object): de...

python web-crawler pipeline scrapy scraper

Rayerayfield asked 3/11, 2010 at 19:21

scraper Questions

Recommended topics

Hot tags