Getting webpage content with Ruby -- I'm having troubles

Asked 6/12, 2009 at 3:2 Answered 6/12, 2009 at 3:17

I want to get the content off this* page. Everything I've looked up gives the solution of parsing CSS elements; but, that page has none.

Here's the only code that I found that looked like it should work:

file = File.open('http://hiscore.runescape.com/index_lite.ws?player=zezima', "r")
contents = file.read
puts contents

Error:

tracker.rb:1:in 'initialize': Invalid argument - http://hiscore.runescape.com/index_lite.ws?player=zezima (Errno::EINVAL)
  from tracker.rb:1:in 'open'
  from tracker.rb:1

*http://hiscore.runescape.com/index_lite.ws?player=zezima

If you try to format this as a link in the post it doesn't recognize the underscore (_) in the URL for some reason.

Appeasement answered 6/12, 2009 at 3:2 Comment(0)

You really want to use open() provided by the Kernel class which can read from URIs you just need to require the OpenURI library first:

require 'open-uri'

Used like so:

require 'open-uri'
file = open('http://hiscore.runescape.com/index_lite.ws?player=zezima')
contents = file.read
puts contents

This related SO thread covers the same question:

Open an IO stream from a local file or url

Marche answered 6/12, 2009 at 3:16 Comment(6)

I see - didn't know that. Still, depending on what he is wanting to do with that content he might be better off with net/http. – Chromatism 6/12, 2009 at 3:23

Oo, that's even better. Thanks. – Appeasement 6/12, 2009 at 4:32

@Chromatism - totally agree that net/http is better in general. I dont rely on this method for anything non-trivial / production. net/http has its shortcomings and I generally prefer the curl bindings (lib curb). This post has good info on http client performance - bit.ly/lvriR curb is great because you have much finer-grained control over the timeouts, which is super critical in high volume production usage. – Marche 6/12, 2009 at 23:48

Do we need to use this syntax "source = open('google.com', &:read)" if we want the file closed? Someone elsewhere on SO said file.read alone won't close the file? Please weigh in on our question if you don't mind: #21270739. – Perron 23/1, 2014 at 10:37

You don't have to use that syntax but you can, will save the 2nd line of having to do the read. Its the same thing really, that 2nd argument is just passing a block to the open() call and the block executes after the open succeeds, thereby running the block (the read) and returning that result. 6 or 1/2 a dozen – Marche 24/1, 2014 at 17:13

@CodyCaughlan And how would you go about updating asset paths in pulled html so that pulled html displays as it would if I navigate directly to that URL in browser? – Namaqualand 1/6, 2014 at 20:57

The appropriate way to fetch the content of a website is through the NET::HTTP module in Ruby:

require 'uri'
require 'net/http'
url = "http://hiscore.runescape.com/index_lite.ws?player=zezima"
r = Net::HTTP.get_response(URI.parse(url).host, URI.parse(url).path)

File.open() does not support URIs.

Best wishes,
Fabian

Chromatism answered 6/12, 2009 at 3:8 Comment(0)

Please use open-uri, its support both uri and local files

require 'open-uri'
contents  = open('http://www.google.com') {|f| f.read }

Orchid answered 6/12, 2009 at 3:17 Comment(0)

Recommended topics

Hot tags