Handling HTTP chunked encoding with django
Asked Answered
S

2

12

I have a problem handeling http chunked transfer encoding.

I'm using:

  • apache.
  • mod_wsgi plugin.
  • django.

django, is only capable of handling reqular http request with content-length header field, but when it comes to handling TE (Transfer-Encoding), chunked or gzip, it returns an empty result.

I'm thinking of 2 approaches:

  1. Making some modification to django.wsgi python file
  2. Add some middleware python file to django, to intercept any chunked http request,convert it to requelar http request with content-length header field, then, pass it to django, where it can handle it nicely.

Anybody can help with any of the above 2 options (more options are most welcome of course)

Thanks!


This is an extention to my question after Graham's first anwer:

First of all, thanks for your quick response. The client being used is Axis, which is a part of another company's system communicating with ours. I had WSGIChunkedRequest On set, I also made some modifications to my wsgi wrapper like this:

def application(environ, start_response):

    if environ.get("mod_wsgi.input_chunked") == "1":
        stream = environ["wsgi.input"]
        print stream
        print 'type: ', type(stream)
        length = 0
        for byte in stream:
            length+=1
        #print length    
        environ["CONTENT_LENGTH"] = len(stream.read(length))

    django_application = get_wsgi_application()
    return django_application(environ, start_response)

but it gives me those errors (extracted from apache's error.log file):

[Sat Aug 25 17:26:07 2012] [error] <mod_wsgi.Input object at 0xb6c35390>
[Sat Aug 25 17:26:07 2012] [error] type:  <type 'mod_wsgi.Input'>
[Sat Aug 25 17:26:08 2012] [error] [client xxxxxxxxxxxxx] mod_wsgi (pid=27210): Exception occurred processing WSGI script '/..../wsgi.py'.
[Sat Aug 25 17:26:08 2012] [error] [client xxxxxxxxxxxxx] Traceback (most recent call last):
[Sat Aug 25 17:26:08 2012] [error] [client xxxxxxxxxxxxx]   File "/..../wsgi.py", line 57, in application
[Sat Aug 25 17:26:08 2012] [error] [client xxxxxxxxxxxxx]     for byte in stream:
[Sat Aug 25 17:26:08 2012] [error] [client xxxxxxxxxxxxx] IOError: request data read error

What Am I doing wrong?!

Sleeve answered 23/8, 2012 at 11:59 Comment(1)
One more thing to add: environ["CONTENT_LENGTH"] = len(stream.read()) didn't work as well, and gave the same error of IOError: request data read error ..Thanks!Sleeve
S
15

This is a not a Django issue. It is a limitation of the WSGI specification itself in as much as the WSGI specification prohibits use of chunked request content by requiring a CONTENT_LENGTH value for request.

When using mod_wsgi there is a switch for enabling non standard support for chunked request content, but that means your application isn't WSGI compliant, plus it would require a custom web application or WSGI wrapper as it still isn't going to work with Django.

The option in mod_wsgi to allow chunked request content is:

WSGIChunkedRequest On

Your WSGI wrapper should call wsgi.input.read() to get whole content, created a StringIO instance with it and use that to replace wsgi.input and then also add a new CONTENT_LENGTH value to environ with actual length before calling wrapped application.

Do note this is dangerous because you will not know how much data is being sent.

What client are you using anyway that only supports chunked request content?


UPDATE 1

Your code is broken for numerous reasons. You should be using something like:

import StringIO

django_application = get_wsgi_application()

def application(environ, start_response):

    if environ.get("mod_wsgi.input_chunked") == "1":
        stream = environ["wsgi.input"]
        data = stream.read()   
        environ["CONTENT_LENGTH"] = str(len(data))
        environ["wsgi.input"] = StringIO.StringIO(data)

    return django_application(environ, start_response)

Note that this will not help with gzip'd request content. You would need an additional check for that to see when content encoding was compressed data and then do same as above. This is because when data is uncompressed by Apache the content length changes and you need to recalculate it.

Stringency answered 23/8, 2012 at 12:21 Comment(11)
Kindly note that the question has been edited according to this answer.Sleeve
Your code is broken for various reasons. Get rid of your for loop and broken length calculation which is actually counting lines rather than characters. All you need is 'data = stream.read()' and go from there. Your calling of get_wsgi_application() on each web request for Django is also broken. I will amend my answer.Stringency
Seems like we are very close to the answer. I copied your code, pasted it in my wrapper; it showed no problem but complaining about this line data = stream.read(), with this error: IOError: request data read error (copied from apache's error.log). From other code, I khow that stream.read() should work normally, what could potentially the problem be?!! Thanks!Sleeve
Could this problem need some upgrade to mod_wsgi version 3.4 ?Sleeve
Do you mean do you need to upgrade to mod_wsgi 3.4? If you do then possibly. I have a vague recollection that this will not work if you are using daemon mode in prior versions of mod_wsgi. I just can't remember the details of the issue and whether it was fixed in 3.4 or not. Are you using embedded mode or daemon mode and what version of mod_wsgi?Stringency
If you are using daemon mode then issue hasn't been fixed. Was deferred to mod_wsgi 4.0. So can only use embedded mode. groups.google.com/forum/?fromgroups=#!topic/modwsgi/Rk-cXTGSCHQStringency
I have already upgraded from 3.3 to 3.4 before your last 2 comments, and still the problem stands. Yes, I use mod_wsgi in daemon mode using this directive WSGIDaemonProcess. I'll try to run it in embedded mode and get back to you. Thanks Graham for all of your help!Sleeve
@GrahamDumpleton I've read in PEP 3333 that "WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable." Are you saying that by requiring Content-Length in all requests, the WSGI spec accidentally contradicts itself, making it impossible to support chunked transfer encoding, even though it says servers should support it?Tarp
Yes and no. A WSGI server can deal with a chunked request or compressed content, but however it does so, according to the specification it still must pass through a CONTENT_LENGTH. Where the amount of the content cannot be determined prior to reading it all in, that is difficult. Result is that the WSGI server would need to read all request content into memory (dangerous) or into a disk file to first calculate it before passing through the request.Stringency
So the requirement of providing CONTENT_LENGTH makes it more difficult to implement in a good way. It would have been better if lack of CONTENT_LENGTH was allowed, in which case the WSGI application was required to read all content until the input terminator (empty string) was returned.Stringency
With Django 1.11 and mod_wsgi (4.6.5) i need to check for environ.get("HTTP_TRANSFER_ENCODING") == "chunked"Hershey
S
3

Now everything works smoothly, the problem was in the daemon mode, as it doesn't work with chunked http traffic, may be in mod_wsgi 4 -- as per Graham Dumpleton. So, if you have this problem switch mod_wsgi to embedded mode.

As a modification to the Graham's code in the wsgi wrapper, there are 2 options where you can read the stream buffered in an environment variable:

First one:

try:
    while True:
        data+= stream.next()
except:
    print 'Done with reading the stream ...'

Second one:

try:
   data+= stream.read()
except:
   print 'Done with reading the stream ...' 

the first code stub, was able to read the buffer in daemon mode but stopped somewhere, and the program didn't continue operational (which confused me a bit, as I expected to see it working nicely), while the other code stub, crashed with an IOError, and only worked in embedded mode.

One more thing to add, upgrading from 3.3 to 3.4 didn't solve the problem, so you have to swtich to embedded mode.

Those are my results and observations. If you have any comments, additions, or corrections, please don't hesitate.

Thanks!

Sleeve answered 26/8, 2012 at 17:0 Comment(2)
Side Note: Don't mix when your application freezes because of chunking issue discussed above, and consuming the file like object environ['wsgi_input'], as you should refill your object with data contents again. For more information, pleasse check this linkSleeve
Usefull links: wsgi, Chunked Tranfer encoding on request content, WSGI Content-Length issues,Ephemeral wsgi.input stream, PEP333Sleeve

© 2022 - 2024 — McMap. All rights reserved.