python parse http response (string)

Asked 14/7, 2014 at 0:17 Answered 14/7, 2014 at 3:56

I'm using python 2.7 and I want to parse string HTTP response fields which I already extracted from a text file. What would be the easiest way? I can parse requests by using the BaseHTTPServer but couldn't manage to find something for the responses.

The responses I have are pretty standard and in the following format

HTTP/1.1 200 OK
Date: Thu, Jul  3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626

Thanks in advance,

Buckskins answered 14/7, 2014 at 0:17 Comment(0)

You might find this useful, keep in mind that HTTPResponse wasn't designed to be "instantiated directly by user."

Also note that the content-length header in your response string may not be valid any more (it depends on how you've aquired these responses) this just means that the call to HTTPResponse.read() needs to have value larger than the content in order to get it all.

In python 2 it can be run this way.

from httplib import HTTPResponse
from StringIO import StringIO

http_response_str = """HTTP/1.1 200 OK
Date: Thu, Jul  3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626"""

class FakeSocket():
    def __init__(self, response_str):
        self._file = StringIO(response_str)
    def makefile(self, *args, **kwargs):
        return self._file

source = FakeSocket(http_response_str)
response = HTTPResponse(source)
response.begin()
print "status:", response.status
print "single header:", response.getheader('Content-Type')
print "content:", response.read(len(http_response_str)) # the len here will give a 'big enough' value to read the whole content

In python 3, the HTTPResponse is imported from http.client, and the response to be parsed needs to be byte encoded. Depending on where the data is gotten from this may be done already or need to be called explicitly

from http.client import HTTPResponse
from io import BytesIO

http_response_str = """HTTP/1.1 200 OK
Date: Thu, Jul  3 15:27:54 2014
Content-Type: text/xml; charset="utf-8"
Connection: close
Content-Length: 626

teststring"""

http_response_bytes = http_response_str.encode()

class FakeSocket():
    def __init__(self, response_bytes):
        self._file = BytesIO(response_bytes)
    def makefile(self, *args, **kwargs):
        return self._file

source = FakeSocket(http_response_bytes)
response = HTTPResponse(source)
response.begin()
print( "status:", response.status)
# status: 200
print( "single header:", response.getheader('Content-Type'))
# single header: text/xml; charset="utf-8"
print( "content:", response.read(len(http_response_str)))
# content: b'teststring'

Polysemy answered 14/7, 2014 at 3:56 Comment(7)

This does really look like the trick I needed. I probably could work my way through by using regexes for my simple purposes but using HTTPResponse feels a lot more correct. Thanks very much. – Buckskins 14/7, 2014 at 8:14

As a follow up, tested and yes, this does what I want. – Buckskins 15/7, 2014 at 8:56

but what if there's a keep-alive connection? can we parse multiple headers/body using this solution? something like the example of this unanswered question: #34787380 – Mendenhall 17/1, 2016 at 13:22

How do i find, if response format was valid and parsing succeeded? – Lucubration 13/9, 2016 at 8:38

for python3 you can use from http.client import HTTPResponse – Tade 29/4, 2019 at 18:27

Has anyone made this work with Python3? I am getting TypeError: decoding str is not supported at

File "/usr/lib/python3.6/http/client.py", line 258, in _read_status     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")

– Parlormaid 25/6, 2019 at 13:56

One of the most useful answer's I've seen on Stack! – Jugendstil 30/9, 2022 at 23:7

-8

You might want to consider using python-requests.

Link: http://docs.python-requests.org/en/latest/

Here is an example from http://dancallahan.info/journal/python-requests/

Considering your responses are compliant with HTTP RFC

Does this look like something you want to do?

>>> import requests
>>> url = 'http://example.test/'
>>> response = requests.get(url)
>>> response.status_code
200
>>> response.headers['content-type']
'text/html; charset=utf-8'
>>> response.content
u'Hello, world!'

Confocal answered 14/7, 2014 at 1:19 Comment(3)

How does this answer the question? – Preachy 22/11, 2014 at 9:54

How would you load a already existing response string into it? – Deafanddumb 14/11, 2017 at 23:52

This is an irrelevant answer. Question was about parsing an already existing full response string not making a request itself. – Matchbox 5/11, 2018 at 11:53

Recommended topics

Hot tags