urllib2 - post request
Asked Answered
G

4

13

I try to perform a simple POST-request with urllib2. However the servers response indicates that it receives a simple GET. I checked the type of the outgoing request, but it is set to POST.
To check whether the server behaves like I expect it to, I tried to perform a GET request with the (former POST-) data concatenated to the url. This got me the answer I expected.
Does anybody have a clue what I misunderstood?

def connect(self):
    url = 'http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html/'
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    header = { 'User-Agent' : user_agent }

    values = {
      'city_from' : 69,
      'radius_from' : 0,
      'city_to' : 263,
      'radius_to' : 0,
      'date' : 'date',
      'day' : 5,
      'month' : 03,
      'year' : 2012,
      'tolerance' : 0
    }

    data = urllib.urlencode(values)
    # req = urllib2.Request(url+data, None, header) # GET works fine
    req = urllib2.Request(url, data, header)  # POST request doesn't not work

    self.response = urllib2.urlopen(req)

This seems to be a problem like the one discussed here: Python URLLib / URLLib2 POST but I'm quite sure that in my case the trailing slash is not missing. ;)

I fear this might be a stupid misconception, but I'm already wondering for hours!



EDIT: A convenience function for printing:

def response_to_str(response):
    return response.read()

def dump_response_to_file(response):
    f = open('dump.html','w')
    f.write(response_to_str(response))



EDIT 2: Resolution:

I found a tool to capture the real interaction with the site, http://fiddler2.com/fiddler2/. Apparently the server takes the data from the input form, redirects a few times and and then makes a GET request with this data simply appended to the url.
Everything is fine with urllib2 and I apologize for misusing your time!

Geotectonic answered 2/3, 2012 at 23:11 Comment(5)
But what is the answer you expected? And how are you sure this isn't a server-side problem?Creative
The behavior I expect you can observe by removing the comment from line 19 (and commenting out line 20, of course). Since this gets me what I want I assume the server works fine. To be precise I want to receive all rides from Dresden to Potsdam on the 5th of March but instead I get all the rides in the system.Geotectonic
Can you post server side code too?Tree
Unfortunately not, because I do not have access to it.Geotectonic
Perhaps the server doesn't accept POST requests to this page then.Tree
G
1

Just to close the question:
The problem really was, that the server did not expect a POST requests (although it should, considered the use case). So (once again) the framework was not broken. ;)

Geotectonic answered 8/4, 2012 at 20:4 Comment(0)
T
15

Things you need to check:

  • Are you sure you are posting to the right URL?
  • Are you sure you can retrieve results without being logged in?
  • Show us some example output for different post values.

You can find correct post URL using Firefox's Firebug or Google Chromes DevTools.

I provided you with some code that supports cookies so that you can log-in first and use the cookie to make the subsequent request with your post parameters.

Finally, if you could show us some example HTML output, that will make life easier.

Here's is my code which has worked for me quite reliably so far for POST-ing to most webpages including pages protected with CSRF/XSRF (as long as you are able to correctly figure out what to post and where (which URL) to post to).

import cookielib
import socket
import urllib
import urllib2

url = 'http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html/'
http_header = {
                "User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.46 Safari/535.11",
                "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,text/png,*/*;q=0.5",
                "Accept-Language" : "en-us,en;q=0.5",
                "Accept-Charset" : "ISO-8859-1",
                "Content-type": "application/x-www-form-urlencoded",
                "Host" : "www.mitfahrgelegenheit.de",
                "Referer" : "http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html/"
                }

params = {
  'city_from' : 169,
  'radius_from' : 0,
  'city_to' : 263,
  'radius_to' : 0,
  'date' : 'date',
  'day' : 5,
  'month' : 03,
  'year' : 2012,
  'tolerance' : 0
}

# setup socket connection timeout
timeout = 15
socket.setdefaulttimeout(timeout)

# setup cookie handler
cookie_jar = cookielib.LWPCookieJar()
cookie = urllib2.HTTPCookieProcessor(cookie_jar)

# setup proxy handler, in case some-day you need to use a proxy server
proxy = {} # example: {"http" : "www.blah.com:8080"}

# create an urllib2 opener()
#opener = urllib2.build_opener(proxy, cookie) # with proxy
opener = urllib2.build_opener(cookie) # we are not going to use proxy now

# create your HTTP request
req = urllib2.Request(url, urllib.urlencode(params), http_header)

# submit your request
res = opener.open(req)
html = res.read()

# save retrieved HTML to file
open("tmp.html", "w").write(html)
print html
Tarsia answered 4/3, 2012 at 1:25 Comment(0)
G
1

Just to close the question:
The problem really was, that the server did not expect a POST requests (although it should, considered the use case). So (once again) the framework was not broken. ;)

Geotectonic answered 8/4, 2012 at 20:4 Comment(0)
R
0

Try adding to your headers the pair:

   'Content-type': 'application/x-www-form-urlencoded'
Reneta answered 2/3, 2012 at 23:21 Comment(3)
I just tried using your exact code here, watched it with wireshark, and it looks like a POST request to me. 211 23.544957 10.0.0.6 62.146.53.71 HTTP 414 POST /mitfahrzentrale/Dresden/Potsdam.html/ HTTP/1.1 (application/x-www-form-urlencoded)Reneta
I assume it really is a POST request, but it looks like the server redirects and changes it to a GET... Could you try out the GET request I commented out in line 19 and compare the result to the one of the POST-request, Dvir? I added a dump-to-html functions to my question above so it shouldn't take to much time. I would really appreciate that! Would at least show me that I did not go crazy staring at this thing. ;)Geotectonic
I ran them both and the results are different. the page with the POST is ~38k, and the page with the GET is ~24k.Reneta
T
0

Try removing the trailing slash from your URL like this:

url = 'http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html'

It may be the case that the server script you're POST request is being sent to doesn't actually support POST requests.

Tree answered 2/3, 2012 at 23:35 Comment(3)
Removing the trailing slash did not help (and doesn't seem to be a good idea according to https://mcmap.net/q/324154/-python-urllib-urllib2-post). Without the User-Agent header the server won't talk to me (responding with a 403) because it apparently does not like the default agent urllib2 submits.Geotectonic
In your case, removing the trailing slash is correct because you've qualified the absolute path to the resource (assuming that Potsdam.html is a file and not a directory).Tree
Ahh, thank you for the explanation! To be honest in my despair I even tried out a trailing question mark, which did not help, either.Geotectonic

© 2022 - 2024 — McMap. All rights reserved.