I would recommend BeautifulSoup as it has the most functionality. I modified a table parser that I found online that can extract all tables from a webpage, as long as there are no nested tables. Some of the code is specific to the problem I was trying to solve, but it should be pretty easy to modify for your usage. Here is the pastbin link.
http://pastebin.com/RPNbtX8Q
You could use it as follows:
from urllib2 import Request, urlopen, URLError
from TableParser import TableParser
url_addr ='http://foo/bar'
req = Request(url_addr)
url = urlopen(req)
tp = TableParser()
tp.feed(url.read())
# NOTE: Here you need to know exactly how many tables are on the page and which one
# you want. Let's say it's the first table
my_table = tp.get_tables()[0]
filename = 'table_as_csv.csv'
f = open(filename, 'wb')
with f:
writer = csv.writer(f)
for row in table:
writer.writerow(row)
The code above is an outline, but if you use the table parser from the pastbin link you should be able to get to where you want to go.
tr_nodes = html.xpath('//table[@id="DataTables_Table_0"]/tr')
(which is a modification based on the webpage's content), I get emptytr_nodes
, any idea what could be the problem? – Corona