I'm trying to write a python program that will grab and display any rss updates since the last time the program was run. I am using feedparser and trying to use etags and last modified as described here on SO but my test script seems to not be working.
import feedparser
rsslist=["http://skottieyoung.tumblr.com/rss","http://mrjakeparker.com/feed/"]
for feed in rsslist:
print('--------'+feed+'-------')
d=feedparser.parse(feed)
print(len(d.entries))
if (len(d.entries) > 0):
etag=d.feed.get('etag','')
modified=d.get('modified',d.get('updated',d.entries[0].get('published','no modified,update or published fields present in rss')))
d2=feedparser.parse(feed,modified)
if (len(d2.entries) > 0):
etag2=d2.feed.get('etag','')
modified2=d2.get('updated',d.entries[0].get('published',''))
if (d2==d): #ideally we would never see this bc etags/last modified would prevent unnecessarily downloading what we all ready have.
print("Arrg these are the same")
I'm honestly not sure if rss/xml technology has changed from the references I've been using online or if there is a problem with my code.
Regardless I'm looking for a best solution to efficiently use rss feeds. As it stands I'm looking to minimize bandwidth waste such as that which is intended by use of last-modified and the etags fields.
Thanks in advance.
feed.etag
. I don't know if it really matters though. – Crutcher