I am using the feedparser library in Python to get the various details from an RSS feed. Suppose I have pulled out 25 headlines titles from an RSS feed of a news channel. After an hour I run the feedparser command again to get the latest list of the titles of the 25 new headlines. The list might or not be updated the second time I run the feedparser command.
Some of the headlines might be same and some might be new. I need to be able to check whether there has been an update in any of the news headlines with the headlines that was pulled out the hour earlier. Only the new headlines must be pushed into a database. This is to avoid duplicate getting dumped into the database.
The code looks like below:
import feedparser
d = feedparser.parse('www.news.example.xml')
for item in d.entries:
hndlr.write(item.title) #data being dumped into a database
I need to be able to run the above code every hour and check if there was any update in the headlines (title). And if there was any change with the data extracted the hour earlier, only the new data should be dumped into the database.