screen-scraping Questions

3

Solved

In Minecraft I was hoping to find a way to read the chat automatically like pictured below In order to record transactions made in the virtual shop into a PostgreSQL database. Preferably using P...
Crestfallen asked 1/12, 2012 at 0:24

2

I am trying to open specific web pages and then take a full page screenshot of the webpage. I know this can be done using the dev tools in Chrome but I have not been able to find a method to do thi...

9

What's a good was to scrape website content using Node.js. I'd like to build something very, very fast that can execute searches in the style of kayak.com, where one query is dispatched to several ...

3

Solved

I seek a tool that can be run on the command line like so: tablescrape 'http://someURL.foo.com' [n] If n is not specified and there's more than one HTML table on the page, it should summarize th...
Resultant asked 9/4, 2010 at 22:40

6

Solved

I am brand new to Tor and I feel like multiple Tors should be considered. The multiple tors I mentioned here are not only multiple instances, but also using different proxy ports for each, like wha...
Spandau asked 14/1, 2013 at 15:18

8

Solved

I have HTML webpages that I am crawling using xpath. The etree.tostring of a certain node gives me this string: <script> <!-- function escramble_758(){ var a,b,c a='+1 ' b='84-' a+='4...
Toratorah asked 13/4, 2012 at 6:39

6

Solved

Without the use of any external library, what is the simplest way to fetch a website's HTML content into a String?
Smithers asked 28/8, 2008 at 1:20

6

Solved

I need to code a bot that needs to do the following: Go to a jsp page and search for something by: 1: writing something on a search box 2: clicking the search button(submit button) 3: clicking o...
Aguayo asked 16/3, 2011 at 9:4

4

The following did not work. wget -r -A .pdf home_page_url It stop with the following message: .... Removing site.com/index.html.tmp since it should be rejected. FINISHED I don't know why it...
Advowson asked 16/8, 2013 at 13:33

10

Solved

I have recently been learning Python and am dipping my hand into building a web-scraper. It's nothing fancy at all; its only purpose is to get the data off of a betting website and have this data p...
Broderickbrodeur asked 18/12, 2011 at 6:3

5

I successfully highlight the section in a web page, but send_keys, .send_keys(Keys.CONTROL, "c"), does not place the intended text to copy in clipboard, only the last thing I manually copied is in ...
Fiche asked 11/6, 2016 at 11:18

0

I have been working with pytrends, a package to retrieve google trends data, a long while now and realised that the results I get on the browser and using pytrends differ quite a bit. After checkin...
Closestool asked 7/10, 2022 at 13:51

11

Solved

I have Puppeteer controlling a website with a lookup form that can either return a result or a "No records found" message. How can I tell which was returned? waitForSelector seems to wait for only...
Dillondillow asked 20/4, 2018 at 17:15

8

Solved

I am writing a scraper that downloads all the image files from a HTML page and saves them to a specific folder. All the images are part of the HTML page.
Charactery asked 2/11, 2008 at 21:31

7

I've tried using the Sanitize gem to clean a string which contains the HTML of a website. It only removed the <script> tags, not the JavaScript inside the script tags. What can I use to rem...
Duplicate asked 28/11, 2011 at 5:18

2

Solved

I'm doing some scraping, but as I'm parsing approximately 4000 URL's, the website eventually detects my IP and blocks me every 20 iterations. I've written a bunch of Sys.sleep(5) and a tryCatch so ...
Broadbent asked 7/4, 2021 at 12:24

5

Solved

How I can get the content of the web page using ASP.NET? I need to write a program to get the HTML of a webpage and store it into a string variable.
Renshaw asked 22/12, 2010 at 14:32

5

Sometimes when trying to scrape Instagram media, by adding at the end of the URL (?__a=1) EX: https://www.instagram.com/p/CP-Kws6FoRS/?__a=1 The response returned { "__ar": 1, "err...
Stilbite asked 1/6, 2022 at 20:15

8

Solved

I want to screen-scrape a web-site that uses JavaScript. There is mechanize, the programmatic web browser for Python. However, it (understandably) doesn't interpret javascript. Is there any progr...
Replete asked 16/12, 2009 at 18:37

0

I have a scheduled job that runs every day to fetch the list of following and followers of my profile. I append the following at the end of URL (?__a=1) to fetch data. Since yesterday, I am getting...

17

Solved

Assuming I have an Amazon product URL like so http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&...
Lillalillard asked 19/11, 2009 at 16:28

13

Solved

I'm using the following code based on loadspeed.js example to open up a https:// site which requires http server authentication as well. var page = require('webpage').create(), system = require('s...
Henrion asked 18/8, 2012 at 19:21

1

I have written this expression //*[contains(text(), "Brand:" )] for the below HTML code. <div class="info-product mt-3"> <h3>Informazioni prodotto</h3> Brand: &l...
Otilia asked 24/2, 2022 at 14:32

9

Solved

I asked a question on realizing a general idea to crawl and save webpages. Part of the original question is: how to crawl and save a lot of "About" pages from the Internet. With some further resea...
Thaxter asked 12/10, 2011 at 21:28

6

Solved

Is there a simple way in R to extract only the text elements of an HTML page? I think this is known as 'screen scraping' but I have no experience of it, I just need a simple way of extracting the ...
Candicecandid asked 7/7, 2010 at 14:4

© 2022 - 2024 — McMap. All rights reserved.