screen-scraping Questions

6

Solved

I'm trying to use jsoup to login to a site and then scrape information, I am running into in a problem, I can login successfully and create a Document from index.php but I cannot get other pages on...
Tetrachord asked 21/6, 2011 at 22:56

4

Solved

I have a partner that has created some content for me to scrape. I can access the page with my browser, but when trying to user file_get_contents, I get a 403 forbidden. I've tried using stream_co...
Despiteful asked 27/7, 2012 at 2:36

3

We have a tool which checks if a given URL is a live URL. If a given url is live another part of our software can screen scrap the content from it. This is my code for checking if a url is live ...

2

Solved

I am looking to handle a DNS error when scraping domains Scrapy. Here's the error that I am seeing: ERROR: Error downloading <GET http://domain.com>: DNS lookup failed: address 'domain.com'...
Tara asked 15/9, 2014 at 2:27

5

Solved

I know this has been asked before but I can't find a good answer for node.js I need server-side to extract the plain text (no tags, script, etc.) from an HTML page that is fetched. I know how to do...
Barrens asked 14/11, 2013 at 18:39

3

Solved

I'm a newbie programmer trying to jump in to Python by building a script that scrapes http://en.wikipedia.org/wiki/2000s_in_film and extracts a list of "Movie Title (Year)". My HTML sourc...
Strongwilled asked 6/12, 2010 at 3:39

1

Solved

What is the most efficient way to capture screen in python using modules eg PIL or cv2? Because It takes up a lot of ram. I wanted to teach AI to play dino game of Chrome through screen scraping an...
Pentylenetetrazol asked 3/9, 2020 at 11:13

1

I'm currently working on creating an Ambilight for my computer monitor with C#, an arduino, and an Ikea Dioder. Currently the hardware portion runs flawlessly; however, I'm having a problem with de...
Fascista asked 23/10, 2013 at 18:57

2

I'm trying to use the Ruby version of Mechanize to extract my employer's tickets from a ticket management system that we're moving away from that does not supply an API. Problem is, it seems Mecha...
Daze asked 12/8, 2011 at 21:31

6

I have around 10 odd sites that I wish to scrape from. A couple of them are wordpress blogs and they follow the same html structure, albeit with different classes. The others are either forums or b...
Lawhorn asked 31/3, 2011 at 8:44

2

I have a website I would like to click a button on then scrape the website using python the html code between the button is: <span id="exchange-testing" class="exchange-input nav-link" data tr...
Jaggy asked 9/11, 2014 at 0:23

4

Solved

I am trying to use selenium from python to scrape some dynamics pages with javascript. However, I cannot call firefox after I followed the instruction of selenium on the pypi page(http://pypi.pytho...

3

Solved

I am trying to build a proxy scraper for a specific site, but I'm failing on move to next page. This is the code that I'm using. If you answer my question, please, explain me a bit about what you...
Zygophyte asked 28/12, 2018 at 9:4

2

I'm looking to build an app using property data. Nestoria has a free API and rules of use and Zoopla an API you register for. OnTheMarket and Rightmove have same terms of use to the letter (bizarre...
Worn asked 16/4, 2016 at 9:39

8

Solved

I'm using Nokogiri and open-uri to grab the contents of the title tag on a webpage, but am having trouble with accented characters. What's the best way to deal with these? Here's what I'm doing: r...
Dorton asked 3/4, 2010 at 19:28

2

Solved

I have been trying to scrape a website protected by Distil Networks, in which using selenium (with Python) would just always fail. I did a few searches, and my conclusion is that the site can dete...

7

I want to download some Yahoo Groups (files, photos, messages, memberlist) and I've found these scripts: http://freshmeat.net/projects/grabyahoogroup/ http://sourceforge.net/project/showfiles.ph...
Menashem asked 18/3, 2009 at 17:58

4

I need to detect scraping of info on my website. I tried detection based on behavior patterns, and it seems to be promising, although relatively computing heavy. The base is to collect request tim...
Triolein asked 20/3, 2011 at 22:53

1

Solved

I am trying to get a list of articles using a combo of the googlesearch and newspaper3k python packages. When using article.parse, I end up getting an error: newspaper.article.ArticleException: Art...
Nole asked 20/6, 2019 at 3:52

0

Given an example URL endpoint like this, how would you go about scraping the final price for each combination of conditions and output it to excel in PHP? I found the json data in the HTML and dec...
Taxiway asked 22/6, 2019 at 17:7

2

Solved

I want to obtain the links to the atms listed on this page: https://coinatmradar.com/city/345/bitcoin-atm-birmingham-uk/ Would I need to do something about the 'load more' button at the bottom of ...
Olecranon asked 13/5, 2019 at 19:49

4

Solved

I was wondering if it is possible to "automate" the task of typing in entries to search forms and extracting matches from the results. For instance, I have a list of journal articles for which I wo...
Otti asked 23/7, 2009 at 7:11

0

I want to login to a website and navigate to a specific page to scrape the data. I am planning on using scraping (not API at the moment) and for learning purpose I plan on doing it on my stackoverf...

5

I have a website that I'm scraping that has a similar structure the following. I'd like to be able to grab the info out of the CData block. I'm using BeautifulSoup to pull other info off the page...
Taima asked 9/1, 2010 at 2:53

3

Solved

In the code snippet below, you can see that I am trying to scrape some data from the NCAA Men's Basketball website. import requests url = "https://www.ncaa.com/scoreboard/basketball-men/d1/" res...

© 2022 - 2025 — McMap. All rights reserved.