Python urllib2 automatic form filling and retrieval of results
Asked Answered
T

3

9

I'm looking to be able to query a site for warranty information on a machine that this script would be running on. It should be able to fill out a form if needed ( like in the case of say HP's service site) and would then be able to retrieve the resulting web page.

I already have the bits in place to parse the resulting html that is reported back I'm just having trouble with what needs to be done in order to do a POST of data that needs to be put in the fields and then being able to retrieve the resulting page.

Ticktacktoe answered 14/4, 2011 at 18:17 Comment(0)
B
16

If you absolutely need to use urllib2, the basic gist is this:

import urllib
import urllib2
url = 'http://whatever.foo/form.html'
form_data = {'field1': 'value1', 'field2': 'value2'}
params = urllib.urlencode(form_data)
response = urllib2.urlopen(url, params)
data = response.read()

If you send along POST data (the 2nd argument to urlopen()), the request method is automatically set to POST.

I suggest you do yourself a favor and use mechanize, a full-blown urllib2 replacement that acts exactly like a real browser. A lot of sites use hidden fields, cookies, and redirects, none of which urllib2 handles for you by default, where mechanize does.

Check out Emulating a browser in Python with mechanize for a good example.

Birthplace answered 14/4, 2011 at 19:27 Comment(2)
I will also put in a vote for mechanize. I have used it a number of times. Really useful and much easier than urllib and urllib2 for doing complex things.Arsenite
I agree. Mechanize is the standard tool for doing this. Don't use urllib2 unless you absolutely must.Ninanincompoop
M
1

Using urllib and urllib2 together,

data = urllib.urlencode([('field1',val1), ('field2',val2)]) # list of two-element tuples
content = urllib2.urlopen('post-url', data)

content will give you the page source.

Moving answered 14/4, 2011 at 18:53 Comment(0)
H
0

I’ve only done a little bit of this, but:

  1. You’ve got the HTML of the form page. Extract the name attribute for each form field you need to fill in.
  2. Create a dictionary mapping the names of each form field with the values you want submit.
  3. Use urllib.urlencode to turn the dictionary into the body of your post request.
  4. Include this encoded data as the second argument to urllib2.Request(), after the URL that the form should be submitted to.

The server will either return a resulting web page, or return a redirect to a resulting web page. If it does the latter, you’ll need to issue a GET request to the URL specified in the redirect response.

I hope that makes some sort of sense?

Helfand answered 14/4, 2011 at 18:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.