screen-scraping - 2

6

Solved

I'm trying to use jsoup to login to a site and then scrape information, I am running into in a problem, I can login successfully and create a Document from index.php but I cannot get other pages on...

java screen-scraping jsoup

Tetrachord asked 21/6, 2011 at 22:56

4

Solved

file_get_contents() give me 403 Forbidden

I have a partner that has created some content for me to scrape. I can access the page with my browser, but when trying to user file_get_contents, I get a 403 forbidden. I've tried using stream_co...

php html http-headers screen-scraping

Despiteful asked 27/7, 2012 at 2:36

3

Getting Error "The remote server returned an error: (403) Forbidden" when screen scraping using HttpWebRequest.GetResponse()

We have a tool which checks if a given URL is a live URL. If a given url is live another part of our software can screen scrap the content from it. This is my code for checking if a url is live ...

c#httpwebrequest screen-scraping httpwebresponse http-status-code-403

Rubellite asked 13/1, 2011 at 10:35

2

Solved

how to scrapy handle dns lookup failed

I am looking to handle a DNS error when scraping domains Scrapy. Here's the error that I am seeing: ERROR: Error downloading <GET http://domain.com>: DNS lookup failed: address 'domain.com'...

python dns scrapy screen-scraping

Tara asked 15/9, 2014 at 2:27

5

Solved

How to convert HTML page to plain text in node.js?

I know this has been asked before but I can't find a good answer for node.js I need server-side to extract the plain text (no tags, script, etc.) from an HTML page that is fetched. I know how to do...

javascript node.js screen-scraping

Barrens asked 14/11, 2013 at 18:39

3

Solved

BeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?

I'm a newbie programmer trying to jump in to Python by building a script that scrapes http://en.wikipedia.org/wiki/2000s_in_film and extracts a list of "Movie Title (Year)". My HTML sourc...

python html beautifulsoup screen-scraping

Strongwilled asked 6/12, 2010 at 3:39

1

Solved

What is the most efficient way to capture screen in python using modules eg PIL or cv2? because It takes up a lot of ram

What is the most efficient way to capture screen in python using modules eg PIL or cv2? Because It takes up a lot of ram. I wanted to teach AI to play dino game of Chrome through screen scraping an...

python performance opencv screen screen-scraping

Pentylenetetrazol asked 3/9, 2020 at 11:13

1

C# Quickest Way to Get Average Colors of Screen

I'm currently working on creating an Ambilight for my computer monitor with C#, an arduino, and an Ikea Dioder. Currently the hardware portion runs flawlessly; however, I'm having a problem with de...

c#directx screen-scraping gdi+directx-11

Fascista asked 23/10, 2013 at 18:57

2

Maintaining cookies between Mechanize requests

I'm trying to use the Ruby version of Mechanize to extract my employer's tickets from a ticket management system that we're moving away from that does not supply an API. Problem is, it seems Mecha...

ruby screen-scraping mechanize

Daze asked 12/8, 2011 at 21:31

6

what is the best way to scrape multiple domains with scrapy?

I have around 10 odd sites that I wish to scrape from. A couple of them are wordpress blogs and they follow the same html structure, albeit with different classes. The others are either forums or b...

python screen-scraping scrapy

Lawhorn asked 31/3, 2011 at 8:44

2

Click button on website then scrape web page

I have a website I would like to click a button on then scrape the website using python the html code between the button is: <span id="exchange-testing" class="exchange-input nav-link" data tr...

python onclick click web-scraping screen-scraping

Jaggy asked 9/11, 2014 at 0:23

4

Solved

unable to call firefox from selenium in python on AWS machine

I am trying to use selenium from python to scrape some dynamics pages with javascript. However, I cannot call firefox after I followed the instruction of selenium on the pypi page(http://pypi.pytho...

python selenium amazon-web-services screen-scraping web-scraping

Wardieu asked 23/10, 2012 at 21:26

3

Solved

How to move to the next page on Python Selenium?

I am trying to build a proxy scraper for a specific site, but I'm failing on move to next page. This is the code that I'm using. If you answer my question, please, explain me a bit about what you...

python python-3.x selenium screen-scraping webdriverwait

Zygophyte asked 28/12, 2018 at 9:4

2

Rightmove API and scraping technical and legal

I'm looking to build an app using property data. Nestoria has a free API and rules of use and Zoopla an API you register for. OnTheMarket and Rightmove have same terms of use to the letter (bizarre...

api screen-scraping

Worn asked 16/4, 2016 at 9:39

8

Solved

Nokogiri, open-uri, and Unicode Characters

I'm using Nokogiri and open-uri to grab the contents of the title tag on a webpage, but am having trouble with accented characters. What's the best way to deal with these? Here's what I'm doing: r...

ruby unicode screen-scraping nokogiri open-uri

Dorton asked 3/4, 2010 at 19:28

2

Solved

Node.js scraping with chrome-remote-interface

I have been trying to scrape a website protected by Distil Networks, in which using selenium (with Python) would just always fail. I did a few searches, and my conclusion is that the site can dete...

python node.js google-chrome selenium screen-scraping

Dock asked 4/5, 2017 at 16:29

7

How can I download Yahoo Groups?

I want to download some Yahoo Groups (files, photos, messages, memberlist) and I've found these scripts: http://freshmeat.net/projects/grabyahoogroup/ http://sourceforge.net/project/showfiles.ph...

perl scripting download screen-scraping

Menashem asked 18/3, 2009 at 17:58

4

The way to detect web scraping

I need to detect scraping of info on my website. I tried detection based on behavior patterns, and it seems to be promising, although relatively computing heavy. The base is to collect request tim...

algorithm security screen-scraping detection

Triolein asked 20/3, 2011 at 22:53

1

Solved

How to fix Newspaper3k 403 Client Error for certain URL's?

I am trying to get a list of articles using a combo of the googlesearch and newspaper3k python packages. When using article.parse, I end up getting an error: newspaper.article.ArticleException: Art...

python web url screen-scraping python-newspaper

Nole asked 20/6, 2019 at 3:52

0

How do I scrape AngularJS site data with conditional decision tree for each final answer?

Given an example URL endpoint like this, how would you go about scraping the final price for each combination of conditions and output it to excel in PHP? I found the json data in the HTML and dec...

php json conditional-statements screen-scraping

Taxiway asked 22/6, 2019 at 17:7

2

Solved

Issue scraping page with "Load more" button with rvest

I want to obtain the links to the atms listed on this page: https://coinatmradar.com/city/345/bitcoin-atm-birmingham-uk/ Would I need to do something about the 'load more' button at the bottom of ...

r web-scraping screen-scraping rvest

Olecranon asked 13/5, 2019 at 19:49

4

Solved

web scraping to fill out (and retrieve) search forms?

I was wondering if it is possible to "automate" the task of typing in entries to search forms and extracting matches from the results. For instance, I have a list of journal articles for which I wo...

forms search screen-scraping doi

Otti asked 23/7, 2009 at 7:11

0

Login into a website using google apps script and click through to scrape data

I want to login to a website and navigate to a specific page to scrape the data. I am planning on using scraping (not API at the moment) and for learning purpose I plan on doing it on my stackoverf...

google-apps-script web-scraping screen-scraping login-control http-response-codes

Mycobacterium asked 15/3, 2019 at 3:38

5

How can i grab CData out of BeautifulSoup

I have a website that I'm scraping that has a similar structure the following. I'd like to be able to grab the info out of the CData block. I'm using BeautifulSoup to pull other info off the page...

python screen-scraping beautifulsoup cdata

Taima asked 9/1, 2010 at 2:53

3

Solved

Python Requests Module Not Getting Latest Data from Web Server

In the code snippet below, you can see that I am trying to scrape some data from the NCAA Men's Basketball website. import requests url = "https://www.ncaa.com/scoreboard/basketball-men/d1/" res...

python web-scraping beautifulsoup python-requests screen-scraping

Matchless asked 26/1, 2019 at 18:40

screen-scraping Questions

Recommended topics

Hot tags