How can I request (get) and read an xml file using python?
Asked Answered
W

1

6

I tried requesting an RSS feed on Treasury Direct using Python. In the past I've used urllib, or requests libraries to serve this purpose and it's worked fine. This time however, I continue to get the 406 status error, which I understand is the page's way of telling me it doesn't accept my header details from the request. I've tried altering it however to no avail.
This is how I've tried

import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
user_agent = {'User-agent': 'Mozilla/5.0'}
response  = requests.get(url, headers = user_agent)
print response.text

Environments: Python 2.7 and 3.4. I also tried accessing via curl with the same exact error.

I believe this to be page specific, but can't figure out how to appropriately frame a request to read this page.

I found an API on the page which I can read the same data in json so this issue is now more of a curiosity to me than a true problem.

Any answers would be greatly appreciated!

Header Details

{'surrogate-control': 'content="ESI/1.0",no-store', 'content-language': 'en-US', 'x-content-type-options': 'nosniff', 'x-powered-by': 'Servlet/3.0', 'transfer-encoding': 'chunked', 'set-cookie': 'BIGipServerpl_www.treasurydirect.gov_443=3221581322.47873.0000; path=/; Httponly; Secure, TS01598982=016b0e6f4634928e3e7e689fa438848df043a46cb4aa96f235b0190439b1d07550484963354d8ef442c9a3eb647175602535b52f3823e209341b1cba0236e4845955f0cdcf; Path=/', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'keep-alive': 'timeout=10, max=100', 'connection': 'Keep-Alive', 'cache-control': 'no-store', 'date': 'Sun, 23 Apr 2017 04:13:00 GMT', 'x-frame-options': 'SAMEORIGIN', '$wsep': '', 'content-type': 'text/html;charset=ISO-8859-1'}
Whitton answered 23/4, 2017 at 4:25 Comment(0)
S
6

You need to add accept to headers request:

import requests

url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
headers = {'accept': 'application/xml;q=0.9, */*;q=0.8'}
response = requests.get(url, headers=headers)

print response.text
Shearwater answered 23/4, 2017 at 6:30 Comment(3)
Worked perfectly. Thanks!Whitton
I just want to share this link to make further readers understand why you devised the header that wayClite
Saved me a ton of time. Cheers.Tubular

© 2022 - 2024 — McMap. All rights reserved.