How big is too big for an RSS feed XML file?
Asked Answered
P

2

12

I'm implementing an RSS feed for a website and I don't understand certain things about the format/size/content of the XML file for the feed.

I'm initializing the site with the past data, which runs back to 1999 (there was no feed at any point before now), and only a couple hundred items will be added per year.

Is there some protocol for archiving, or can I just keep the one file and continue appending to it? I'd think that would be inefficient, as the aggregators have to download the whole thing (I assume).

So, what's the usual custom for this? Limit it to the last month? The current file with over 900 items is 1.5MB, and I'd expect 1 year's worth to be about 1/10th that in size or less.

Any pointers on this on what principles to use and how to implement it? I'm using PHP, but my data is complicated enough I rolled my own script to write the file (and it validates just fine), so I can't use a canned solution -- I need to understand what to implement in my own script.

Prospectus answered 15/3, 2011 at 22:30 Comment(5)
What magic did you perform to get it answered? It would have been a lot more helpful to me 3 months ago!Prospectus
I used to be a syndication geek, and the question was more architectural than technical in nature. The only thing I failed to mention is be sure to run your final feeds through validator.w3.org/feed this will save you and your consumers a lot of heartache!Vergne
@david i edited your grammar slightly so as not to offend the users and when you edit the question the question gets higher ranking and more visibilityHotheaded
Well, I don't agree with your tag edits -- my question is not about PHP or scripting. My question is entirely about the RSS output format. But I'll leave it alone, since I got the answer I needed (just 90 days later than I needed it).Prospectus
@Oppositional: yes, I validated my feed repeatedly. I'd have been completely clueless had I not -- I actually used feedvalidator.org instead of the w3 validator, as it had lots of really specific help for all the things that came up. It functionaed as a de facto tutorial on how to get it right!Prospectus
V
8

Most consumers of syndication feeds have the expectation that the feed will contain relatively recent content, with previously published content 'falling off' of the feed. How much content you maintain in the feed is usually based on the type of content you are publishing but as the size of your feed grows it can impact a feed clients ability to retrieve and parse your information.

If you truly want to publish a historical feed that is continually added to but never has content items removed, you may want to consider the following options (based on the needs of your consumers):

  1. Implement Feed Paging and Archiving, per RFC 5005 Section 3, as paged feeds can be useful when the number of entries is very large, infinite, or indeterminate. Clients can "page" through the feed, only accessing a subset of the feed's entries as necessary.
  2. Logically segment your content into multiple feeds, and provide auto-discovery to the feeds on your website.
  3. Implement a REST based service interface that allows consumers to retrieve and filter your content as an Atom or RSS formatted feed, with the default representation using some reasonable defaults.

Option 1 is a reasonable approach only if you know the type of feed clients that will be consuming your feed, as not all feed clients support pagination.

Option 2 is the most common one seen on public facing web sites, as most browsers and clients support auto-discovery, and you can provide both a full historical feed and a smaller more recent content feed (or segment in ways that make sense for your content).

Option 3 potentially allows you to provide the benefits of both of the first two options, plus you can provide multiple feed formats and rich filtering of your content. It is a very powerful way to expose feed content, but usually is only worth the effort if your consumers indicate a desire for tailoring the feed content they wish to consume.

While most rich feed clients will retrieve feed content asynchronously, clients that make synchronous (and potentially frequent) requests for your feed may experience timeout issues as the size of your feed increases.

Regardless of what direction you take, consider implementing Conditional GET on your feeds; and understand the potential consumers of your syndicated content in order to choose the strategy that fits best. See this answer when you consider which syndication feed format(s) you want to provide.

Vergne answered 7/6, 2011 at 23:44 Comment(1)
I actually ended up implementing the feed as a script, so I could provide multiple subfeeds. I also placed a LIMIT on the SQL that retrieves the data. I eventually realized that providing the whole feed only mattered to me at the beginning, but it probably didn't matter to any of the people subscribing to it. Thanks for the excellent answer. I have filed away several of your citations for further investigation, particularly on the question of providing a last-updated header.Prospectus
S
0

Aggregators will download the file repeatedly, so limiting the size is important. I would have the feed contain either 10 items, or have the oldest item a week old, whichever gives more entries, unless overridden with a GET parameter. Of course this will vary by the actual usage you see from your clients as well as the activity in the feed itself.

Shannanshannen answered 7/6, 2011 at 20:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.