Python: BaseHTTPRequestHandler - Read raw post
Asked Answered
S

4

15

How do I read the raw http post STRING. I've found several solutions for reading a parsed version of the post, however the project I'm working on submits a raw xml payload without a header. So I am trying to find a way to read the post data without it being parsed into a key => value array.

Shanan answered 26/7, 2013 at 18:28 Comment(0)
A
20

I think self.rfile.read(self.headers.getheader('content-length')) should return the raw data as a string. According to the docs directly inside the BaseHTTPRequestHandler class:

- rfile is a file object open for reading positioned at the
start of the optional input data part;
Averett answered 26/7, 2013 at 18:33 Comment(5)
After trying and doing some quick googling, this operations blocks execution for me as well as others.Shanan
Need to supply content length: data = self.rfile.read(int(self.headers.getheader('content-length')))Shanan
Yes, sorry. It's blocking because the rfile object is a socket, and calling read() is basically saying 'read until there's nothing left to read' but there's more to read so long as the socket is open, so it hangs and waits for incoming content. Servers avoid the hanging by ALWAYS specifying HOW MUCH content to read. Sorry, I should have put that in in the first place.Averett
With Python 3.5 you need to use "get" instead of "getheader".Chayachayote
What happens when there is no "content-length" header? Your server just crashes?Scagliola
W
26

self.rfile.read(int(self.headers.getheader('Content-Length'))) will return the raw HTTP POST data as a string.

Breaking it down:

  1. The header 'Content-Length' specifies how many bytes the HTTP POST data contains.
  2. self.headers.getheader('Content-Length') returns the content length (value of the header) as a string.
  3. This has to be converted to an integer before passing as parameter to self.rfile.read(), so use the int() function.

Also, note that the header name is case sensitive so it has to be specified as 'Content-Length' only.

Edit: Apparently header field is not case sensitive (at least in Python 2.7.5) which I believe is the correct behaviour since https://www.rfc-editor.org/rfc/rfc2616 states:

Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive.

Wideman answered 2/1, 2014 at 9:34 Comment(5)
Please be more verbose, I have no idea what are you suggesting.Diametral
@jb: I added more details to the answer. Let me know if there is anything specific that still needs to be elaborated.Wideman
@SindhuriKuppasad, the header name is not case-sensitive. The following statements both return the content length in my tests: self.headers.getheader('content-length') and self.headers.getheader('content-LENGTH')Walli
@famzah, that's interesting. I cannot recall which version of Python I was using when I wrote this answer, but the case had mattered and that was the reason I put the answer here in the first place. I checked on 2.7.5 now and you're right, the case doesn't matter.Wideman
In python3 it would be self.headers.get('content-length')Brasca
A
20

I think self.rfile.read(self.headers.getheader('content-length')) should return the raw data as a string. According to the docs directly inside the BaseHTTPRequestHandler class:

- rfile is a file object open for reading positioned at the
start of the optional input data part;
Averett answered 26/7, 2013 at 18:33 Comment(5)
After trying and doing some quick googling, this operations blocks execution for me as well as others.Shanan
Need to supply content length: data = self.rfile.read(int(self.headers.getheader('content-length')))Shanan
Yes, sorry. It's blocking because the rfile object is a socket, and calling read() is basically saying 'read until there's nothing left to read' but there's more to read so long as the socket is open, so it hangs and waits for incoming content. Servers avoid the hanging by ALWAYS specifying HOW MUCH content to read. Sorry, I should have put that in in the first place.Averett
With Python 3.5 you need to use "get" instead of "getheader".Chayachayote
What happens when there is no "content-length" header? Your server just crashes?Scagliola
V
5

For python 3.7 the below worked for me:

rawData = (self.rfile.read(int(self.headers['content-length']))).decode('utf-8')

With the help of the other answers in this question and this and this. The last link actually contains the full solution.

Valor answered 12/9, 2019 at 6:40 Comment(1)
@JulesG.M. thats what I found in the last link I provided. Also utf-8 worked for the contents I was reading as raw data from the server side. If the server side is returning it encoded in any other format, that value will also need change for decoding.Valor
C
2

The read() method on the io.BufferedIOBase object reads until EOF. Not all browsers send the EOF character (source). Reading Content-Length bytes is a good solution. Using the read1() method also worked for me. It reads as much as possible in a single non-blocking API call.

Cognomen answered 2/11, 2020 at 19:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.