Python read page from URL? Better documentation?

Asked 1/10, 2011 at 1:56 Answered 1/10, 2011 at 2:16

I'm having quite a bit of trouble with Python's documentation. Is there anything like the Mozilla Developer Network for it?

I'm doing a Python puzzle website and I need to be able to read the content of the page. I saw the following posted on a site:

import urllib2

urlStr = 'http://www.python.org/'
try:
  fileHandle = urllib2.urlopen(urlStr)
  str1 = fileHandle.read()
  fileHandle.close()
  print ('-'*50)
  print ('HTML code of URL =', urlStr)
  print ('-'*50)
except IOError:
  print ('Cannot open URL %s for reading' % urlStr)
  str1 = 'error!'

print (str1)

It keeps saying that there is no urllib2 module.

The Python documentation says

The urllib module has been split into parts and renamed in Python 3.0 to urllib.request, urllib.parse, and urllib.error. The 2to3 tool will automatically adapt imports when converting your sources to 3.0. Also note that the urllib.urlopen() function has been removed in Python 3.0 in favor of urllib2.urlopen().

I tried importing urllib.request too, but it ssays urllib 2 is defined... WTF is going on here?

Version 3.2.2

Declinometer answered 1/10, 2011 at 1:56 Comment(6)

Your Python version would be useful at this juncture. – Margaretmargareta 1/10, 2011 at 2:6

Updated. Any other documentation for Python than what's given? – Declinometer 1/10, 2011 at 2:8

@Walkerneo: docs.python.org/py3k/library/urllib.request.html – Solicitous 1/10, 2011 at 2:11

Is this the documentation you're using? docs.python.org/py3k – Charie 1/10, 2011 at 2:12

Yes and yes. I've been using that. @Solicitous thanks, that helps. I have to do urllib.request.openurl then. Thanks, put it as an answer and I'll accept it. – Declinometer 1/10, 2011 at 2:14

Hey, you actually deleted your answer. I was just kidding though, that's why I didn't accept the other one. Sorry for attempting a sense of humor, I shall not again. – Declinometer 1/10, 2011 at 2:19

The documentation you were probably referencing was the Python 2 documentation for urllib2. The documentation you should probably be using is the Python 3 documentation for urllib.request.

Solicitous answered 1/10, 2011 at 2:16 Comment(0)

Using urllib.request.open(), as recommended in Dive into Python 3...

Python 3.2.1 (default, Jul 24 2011, 22:21:06) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> urlStr = 'http://www.python.org/'
>>> fileHandle = urllib.request.urlopen(urlStr)
>>> print(fileHandle.read()[:100])
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtm'

Margaretmargareta answered 1/10, 2011 at 2:15 Comment(2)

Now file fileHandle contains the complete source. How can I use the xpath on fileHandle data to get particular value? – Thundering 4/4, 2014 at 6:45

@DixitSingla: You could use lxml, like in this answer: https://mcmap.net/q/88752/-can-we-use-xpath-with-beautifulsoup – Margaretmargareta 4/4, 2014 at 9:14

The documentation you were probably referencing was the Python 2 documentation for urllib2. The documentation you should probably be using is the Python 3 documentation for urllib.request.

Solicitous answered 1/10, 2011 at 2:16 Comment(0)

Recommended topics

Hot tags