Hello Emil, all,
See inline comments:
I feel that these 2 points are moving the focus slightly off the real
The question is not really whether we keep track of one or more items
but how do we keep track of items at all. What attributes do we use to
compare one item to another.
We had already exchanged some thoughts on the subject actually and in here:
Vincent proposes using:
- the published date.
- the title.
- the link/URI (which is always there ... but is not necessarily unique)
as a matter of fact I think that on a per-feed basis the link is ought
to be unique. What sense would it make to have two items (different
items) pointing to the same address (OK, that CAN be done, and the
standard doesn't forbid that fact, but I don't know of any automated
feed generator that would provide the same link for different feed
I'd also add:
- other tags that standards may define or that you may discover while
studying various flows or mechanisms that other RSS clients are using
see further on
So to summarize, we need to
1. make sure that our RSS protocol provider implementation supports
using different keys for the different flows, and
I have some doubts that using different keys for different flows would
be the best approach. From what I could understand on how ROME is
built the main idea is to treat all feeds as unitary as possible,
without regard to what kind of feed they are (RSS, ATOM, etc). Using a
per-feed-type indentification mechanism I feel would defeat the
purpose of using ROME, as it would require hacks to bypass ROME
Differenciating between dated and dateless feeds would be doable though.
2. make sure that we are able to uniquely identify flows for as many
cases as possible.
Concerning your question as to whether we should be keeping track of one
or more of the last RSS items: I don't really see the point of keeping
track of more than one item. If we assume that we properly implement
unique identification, then keeping track of the last item is enough. In
case our unique identification implementation is not reliable then it
would still be unreliable even if we keep track of more than one item.
As a matter of fact the issue isn't that straight-forward and I'm
gonna explain why.
At the moment, we treat only feeds that have a date associated with
every item. This is very convenient for a few reasons:
* Date-s can not only be compared for equality but also for order
(is a Date before or after another date). This way, we can assure that
what we consider newer (and unread) feeds really are newer
* even if the items aren't necessarily ordered by date, by keeping
track of the last item's date, we can make sure we don't display the
same item twice
On the other hand, for feeds that don't have a date associated with
each item (or for which the date is in an incorrect format) it's quite
difficult to come up with a way to uniquely identify a feed in a way
that's comparable for both equality and order AND that can remain
consistent even if feed items come up or disappear from a feed (some
sort of "absolute" order, independent of the feeds' universe ).
For identifying this kind of feed items (date-less items) I'd go for
the URL/URI solution for the afore-mentioned reasons.
The main idea for keeping some more than one last item is the fact
that this very item we refer to, could disappear from the feed, thus
leading to the (wrong) conclusion that all items are new items.
What do you think?
Trying to further refine the handling behaviour (per feed type / by
using feed specific tags) seems a little superfluous to me
Any other ideas?
Mihai Balan wrote:
> Hello all,
> And sorry for not giving any sign of life for the past 10 days (it
> seems that programmers too can get writer's block :D)
> As the discussion started here (
> ) hasn't come to a solution regarding this very specific issue
> (uniquely identifying items in a RSS feed), I felt it needed a little
> more consideration. One reason for that is the fact that strict
> identification of news items is needed for a more urgent issue:
> presenting only new items.
> In the current implementation of the RSS plugin, only feeds that
> present a date property for each item are handled correctly. For other
> feeds, they're either not handled at all (see
> http://www.freenews.fr/feeds/rss.php although I suspect some invalid
> RSS too), either all news items are presented as new even though they
> have been read in the past (it's the case with
> http://www.pheedo.com/f/drdobbs_all_articles but although with almost
> all ATOM feeds I tried).
> For the moment I see a couple of solutions for this matter,each with
> advantages and disadvantages, as follows:
> 1. Keeping track of the last item displayed. When retrieving the feed
> we look for the item we last showed. If it's not in the feed, then all
> items are (presumably) new. If it's in the feed, we only show the
> items after last_show_item.
> + easy to implement
> + little overhead in the contact class
> - we rely on the fact that items are stored sequentially and
> chronologically ordered in the feed which usually happends but it's
> not mandatory
> - if the last_show_item disappears form the feed (i.e. for a blog
> feed the author marks the post as private, or draft), then the feed is
> considered to contain only new items, which is obviously false
> - might be a little slow as we first have to identify the
> last_show_item in the feed and only then new items can be shown
> 2.Keeping track of all the items displayed (or at least a fair amount
> of items). This solution will more likely a HashTable or something
> that will contain a per-feed list of shown items. URI/URL-s of news
> items or some other hash (CRC comes to mind, but it's just an idea)
> could be used as key.
> + it should require only one pass through the feed structure, as it
> can be easily decided if an item have been shown or not
> + by using a proper hash we could treat all feeds equally (right now
> feeds with no dates fail miserably )
> - a little more memory consuming
> - I'm not quite sure about this hashing stuff (i'm not going to
> implement SHA1 for feed items, but you get the point)
> 3. Any other idea?
> So what do you think?
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com