Python RSS Parser that also handles FeedBurner

Asked 19/4, 2011 at 21:31 Answered 13/2, 2012 at 9:45

Solved python rss feed xml-parsing feedparser

I was in the middle of writing a Python parser script for RSS feeds. I'm using feedparser, however, I'm stuck on parsing feeds from FeedBurner. Who needs FeedBurner nowadays? Anyways..

For example, I couldn't find ways to parse

http://feeds.wired.com/wired/index

http://feeds2.feedburner.com/ziffdavis/pcmag

When I put those into the feedparser library, don't seem to work. Tried putting ?fmt=xml or ?format=xml at the end of the urls, but still didn't get in xml format.

Do I need to use html parser such as BeautifulSoup to parse FeedBurner feeds? Preferably, is there a python public parser or aggregator script that handles this already?

Any tip or help will be greatly appreciated.

Pane answered 19/4, 2011 at 21:31 Comment(0)

It's possible you have version issue or you're using the API incorrectly -- it would help to see your error message. For example, the following works with Python 2.7 and feedparser 5.0.1:

>>> import feedparser
>>> url = 'http://feeds2.feedburner.com/ziffdavis/pcmag'
>>> d = feedparser.parse(url)
>>> d.feed.title
u'PCMag.com: New Product Reviews'
>>> d.feed.link
u'http://www.pcmag.com'
>>> d.feed.subtitle
u"First Look At New Products From PCMag.com including Lab Tests, Ratings, Editor's and User's Reviews."
>>> len(d['entries'])
30
>>> d['entries'][0]['title']
u'Canon Color imageClass MF9280cdn'

And with the other URL:

>>> url = 'http://feeds.wired.com/wired/index'
>>> d = feedparser.parse(url)
>>> d.feed.title
u'Wired Top Stories'
>>> d.feed.link
u'http://www.wired.com/rss/index.xml'
>>> d.feed.subtitle
u'Top Stories<img src="http://www.wired.com/rss_views/index.gif" />'
>>> len(d['entries'])
30
>>> d['entries'][0]['title']
u'Heart of Dorkness: LARPing Goes Haywire in <em>Wild Hunt</em>'

Cavin answered 19/4, 2011 at 21:48 Comment(2)

Following your instruction works. Thank you ars. I must've kept mistyped something as I was just getting empty result for d['entries'] – Pane 19/4, 2011 at 22:12

@DavidL: glad it works now. I'm quite familiar with "mistyping error". :-) – Cavin 24/4, 2011 at 4:48

I know that this question is very old, but I figure it would be helpful to anyone who happens upon it by searching for a solution to parsing feedburner RSS feeds to paste a simple code I have for getting the latest entry from the Cracked.com feedburner. I have tested it on a few other sites and it works fine.

def GetRSS('RSSurl'):
    url_info = urllib.urlopen(RSSurl)
    if (url_info):
        xmldoc = minidom.parse(url_info)
    if (xmldoc):
        url = xmldoc.getElementsByTagName('link').firstChild.data
        title = xmldoc.getElementsByTagName('title').firstChild.data
        print url, print title

Just replace RSSurl with whatever the address of the feedburner page is. Also, as you can probably see, if there are any other elements you want, you can add just add an extra getElementsByTagName line there, with whatever you would like to get.

Edit: also, to my knowledge, will work with pretty much any RSS feed.

Weak answered 13/2, 2012 at 9:45 Comment(0)

Recommended topics

Hot tags