On Sat, 2006-10-07 at 08:27 +0200, Lars Lindner wrote:
> Am Freitag, den 06.10.2006, 17:17 -0700 schrieb Michael Bernstein:
> > On Sat, 2006-10-07 at 01:29 +0200, Lars Lindner wrote:
> > > Am Freitag, den 06.10.2006, 15:32 -0700 schrieb Michael Bernstein:
> > > > On Fri, 2006-10-06 at 21:39 +0200, Lars Lindner wrote:
> > > > >
> > > > > With mixed id handling where we have these three types of ids:
> > > > >
> > > > > a) real GUIDs provided by the feed
> > > > > b) arbitray ids provided by the feed
> > > > > c) link "ids" provided by Liferea
> > > >
> > > > Can you provide an example of 'b'?
> > >
> > > http://del.icio.us/rss/username
> > There don't seem to be any unique item identifiers. The rdf:about always
> > refers to offsite resources.
> Well the RSS 1.0 spec says rdf:about has to be unique within the feed.
> So it is one type of a pretty unique string. Therefore it falls into
> class b)
So, these are guaranteed (assuming developer/publisher competency) to be
unique within the feed, but not necessarily across feeds.
I guess we need to ask what the desired behavior is for each use case,
and see whether they can be reconciled.
list-o-links: rdf:about might be duplicated across feeds due to any of
the following circumstances:
- More than one person bookmarked the same URL
- The list is ranked by popularity (or any other
non-chronological measure), and the item dropped off the feed,
- The item also appeared in it's original feed (which is also
- publisher error
I am not sure that it universally makes sense to detect these as
duplicates (and hence, mark them as already read), except perhaps for
the first one. I'd argue that this is a non-mainstream use-case though.
Ordinary feed: rdf:about might be duplicated across feeds due to:
- you are subscribed to more than one feed from the same
- publisher error
We can ignore the 'publisher error' use cases. We would want duplicate
detection in cases where Liferea is subscribed to more than one feed
from the same publisher, but I don't think that we can reliably detect
when that is the case (as opposed to a list-o-links feed, or something
Also, under the circumstances, I don't think that rdf:about can be
expected to be preserved as a GUID by a search engine feed (unless the
search engine was producing RSS 1.0). Most likely, a search engine would
generate their own GUID.
If the algorithm by which various search engines and aggregators
generate missing GUIDs is deterministic and reliable, then it may make
sense to generate the same UIDs internally in Liferea for matching
purposes, but I am aware of only one well published GUID generation
method (one which I think Sam Ruby's Venus uses):
And I think only the Google blog-search-engine currently uses it, and
only for the atom feeds of it's results (obviously). I think we may see
this functionality added to more Planets, though.
In case you hadn't noticed, there is a pattern here: The various flavors
of RSS are making things a lot harder, while Atom is well defined enough
to support duplicate detection with reasonable reliability. In
particular, note that should an Atom item move (due to a weblog being
moved to a different domain, for example) the id should nevertheless
remain the same, even if the href changes. Several variants of RSS do
not support this use case at all, and the variants that do support it do
so with optional elements.
> > My advice is: Refuse to guess, put the blame where it belongs (like
> > 'Click here to notify the author their feed is invalid', and 'This feed
> > cannot support duplicate detection' messages), and concentrate on actual
> > value-added features. Some support for comment feeds would be nice, for
> > example.
> Overall I agree. Its probably a good idea to focus on case 1.
> Support for comment feeds is somewhere on the todo list. Since I came to
> like it when I used GreatNews for some time.
Cool. I look forward to that feature.
- Michael Bernstein