From: Kevin <ke...@dr...> - 2006-03-03 14:48:06
|
I found a bug that I'm not sure how to solve and would like some input. See bug #1442493 The current update logic uses the link or guid tag values as a key to detect if an item has already been added to the database. One of these are expected to exist, and also be unique for the feed. If a feed does not contain either of these tags, the behavior is undefined. In v0.2, the feed will add one item only and no more. In 0.3 dev, it will add all items on every update. Here is a url that demonstrates this behavior. http://www.intellectualicebergs.org/mainfeed.rss -Kevin |
From: Andrew T. <ajt...@hi...> - 2006-03-03 14:58:26
|
Hrm, good bug. :) bad feed Well, it seems like the fallback would be to check for uniqueness on FeedTitle, ItemTitle and ItemPubDate? Andy On 3/3/06, Kevin <ke...@dr...> wrote: > I found a bug that I'm not sure how to solve and would like some input. > > See bug #1442493 > > The current update logic uses the link or guid tag > values as a key to detect if an item has already been > added to the database. One of these are expected to > exist, and also be unique for the feed. > > If a feed does not contain either of these tags, the > behavior is undefined. In v0.2, the feed will add one > item only and no more. In 0.3 dev, it will add all > items on every update. > > Here is a url that demonstrates this behavior. > > http://www.intellectualicebergs.org/mainfeed.rss > > -Kevin > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting langua= ge > that extends applications into web and mobile media. Attend the live webc= ast > and join the prime developer group breaking into this new coding territor= y! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > _______________________________________________ > Fofredux-devel mailing list > Fof...@li... > https://lists.sourceforge.net/lists/listinfo/fofredux-devel > -- Andrew Turner ajt...@hi... 42.4266N x 83.4931W http://highearthorbit.com Northville, Michigan, USA |
From: Kevin <ke...@dr...> - 2006-03-03 15:09:10
|
Andrew Turner wrote: > Hrm, good bug. :) bad feed > > Well, it seems like the fallback would be to check for uniqueness on > FeedTitle, ItemTitle and ItemPubDate? > > Andy > > In this case, it's a podcast. I could probably get away with using the enclosure url. More generally, a hash of one or more other fields (title, description, pubdate) would be better. > On 3/3/06, Kevin <ke...@dr...> wrote: > >> I found a bug that I'm not sure how to solve and would like some input. >> >> See bug #1442493 >> >> The current update logic uses the link or guid tag >> values as a key to detect if an item has already been >> added to the database. One of these are expected to >> exist, and also be unique for the feed. >> >> If a feed does not contain either of these tags, the >> behavior is undefined. In v0.2, the feed will add one >> item only and no more. In 0.3 dev, it will add all >> items on every update. >> >> Here is a url that demonstrates this behavior. >> >> http://www.intellectualicebergs.org/mainfeed.rss >> >> -Kevin >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting language >> that extends applications into web and mobile media. Attend the live webcast >> and join the prime developer group breaking into this new coding territory! >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> _______________________________________________ >> Fofredux-devel mailing list >> Fof...@li... >> https://lists.sourceforge.net/lists/listinfo/fofredux-devel >> >> > > > -- > Andrew Turner > ajt...@hi... 42.4266N x 83.4931W > http://highearthorbit.com Northville, Michigan, USA > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642 > _______________________________________________ > Fofredux-devel mailing list > Fof...@li... > https://lists.sourceforge.net/lists/listinfo/fofredux-devel > |
From: Andrew T. <ajt...@hi...> - 2006-03-03 15:17:07
|
yeah, I noticed it was a podcast/itunes feed. However, our final fallback sol'n should use items that are required for a valid feed (e.g. that Magpie won't throw out - which has happened for me w/ certain feeds) though domain-unique UUIDs are a nice idea - seems difficult to implement and maintain? Andy On 3/3/06, Kevin <ke...@dr...> wrote: > Andrew Turner wrote: > > Hrm, good bug. :) bad feed > > > > Well, it seems like the fallback would be to check for uniqueness on > > FeedTitle, ItemTitle and ItemPubDate? > > > > Andy > > > > > In this case, it's a podcast. I could probably get away with using the > enclosure url. > > More generally, a hash of one or more other fields (title, description, > pubdate) would be better. > > > On 3/3/06, Kevin <ke...@dr...> wrote: > > > >> I found a bug that I'm not sure how to solve and would like some input= . > >> > >> See bug #1442493 > >> > >> The current update logic uses the link or guid tag > >> values as a key to detect if an item has already been > >> added to the database. One of these are expected to > >> exist, and also be unique for the feed. > >> > >> If a feed does not contain either of these tags, the > >> behavior is undefined. In v0.2, the feed will add one > >> item only and no more. In 0.3 dev, it will add all > >> items on every update. > >> > >> Here is a url that demonstrates this behavior. > >> > >> http://www.intellectualicebergs.org/mainfeed.rss > >> > >> -Kevin > >> > >> > >> ------------------------------------------------------- > >> This SF.Net email is sponsored by xPML, a groundbreaking scripting lan= guage > >> that extends applications into web and mobile media. Attend the live w= ebcast > >> and join the prime developer group breaking into this new coding terri= tory! > >> http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&da= t=3D121642 > >> _______________________________________________ > >> Fofredux-devel mailing list > >> Fof...@li... > >> https://lists.sourceforge.net/lists/listinfo/fofredux-devel > >> > >> > > > > > > -- > > Andrew Turner > > ajt...@hi... 42.4266N x 83.4931W > > http://highearthorbit.com Northville, Michigan, USA > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting lang= uage > > that extends applications into web and mobile media. Attend the live we= bcast > > and join the prime developer group breaking into this new coding territ= ory! > > http://sel.as-us.falkag.net/sel?cmd=3Dk&kid=110944&bid$1720&dat=121642 > > _______________________________________________ > > Fofredux-devel mailing list > > Fof...@li... > > https://lists.sourceforge.net/lists/listinfo/fofredux-devel > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting langua= ge > that extends applications into web and mobile media. Attend the live webc= ast > and join the prime developer group breaking into this new coding territor= y! > http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat= =3D121642 > _______________________________________________ > Fofredux-devel mailing list > Fof...@li... > https://lists.sourceforge.net/lists/listinfo/fofredux-devel > -- Andrew Turner ajt...@hi... 42.4266N x 83.4931W http://highearthorbit.com Northville, Michigan, USA |
From: Katie B. <ka...@ho...> - 2006-05-30 17:28:37
|
I thought I'd revive this discussion of how we determine item uniqueness -- I was reminded of the issue when I looked at this feed: http://members.digitalblasphemy.com/rss/db.xml The feed is technically valid, but its link elements aren't unique, and so -- or so I presume -- FoFRedux isn't distinguishing between items. Maybe we should take another look at the idea of including some other required element(s) in our "unique" item identifiers? -- Katie Bechtold http://hoteldetective.org/ |
From: Katie B. <ka...@ho...> - 2006-05-31 18:49:24
|
I thought I'd revive the discussion of how we determine item uniqueness -- I was reminded of the issue when I looked at this feed: http://members.digitalblasphemy.com/rss/db.xml The feed is technically valid, but its link elements aren't unique, and so -- or so I presume -- FoFRedux isn't distinguishing between items. Maybe we should take another look at the idea of including some other required element(s) in our "unique" item identifiers? -- Katie Bechtold http://hoteldetective.org/ |
From: Kevin <ke...@dr...> - 2006-06-02 14:41:23
|
Katie Bechtold wrote: > I thought I'd revive the discussion of how we determine item > uniqueness -- I was reminded of the issue when I looked at this > feed: > http://members.digitalblasphemy.com/rss/db.xml > > The feed is technically valid, but its link elements aren't unique, > and so -- or so I presume -- FoFRedux isn't distinguishing between > items. Maybe we should take another look at the idea of including > some other required element(s) in our "unique" item identifiers? > > I actually put in code to get uniqueness from title + content, but it 'only' does that when the link, guid, and enclosure tags do not exist for the item. So, we could: * do nothing. (that feed really should have unique links) * add a per-feed option to use title + content hash for uniqueness * change everything to use title + content hash for uniqueness. -Kevin |
From: Katie B. <ka...@ho...> - 2006-06-02 14:57:19
|
On Fri, Jun 02, 2006 at 08:41:18AM -0600, Kevin wrote: > Katie Bechtold wrote: > > ... > > http://members.digitalblasphemy.com/rss/db.xml > >... > > > I actually put in code to get uniqueness from title + content, but it > 'only' does that when the link, guid, and enclosure tags do not exist > for the item. > > So, we could: > * do nothing. (that feed really should have unique links) I think this is a perfectly valid option and probably a good one if this kind of brokenness is relatively rare. I really don't know how rare it is; has anyone else here seen a feed with non-unique links? > * add a per-feed option to use title + content hash for uniqueness I don't like this option because it relies on the user to recognize broken feeds that aren't very obvious: the software indicates no problem in updating them. The user would have to notice that they haven't seen any new items from that feed in a while and understand the reason for it. > * change everything to use title + content hash for uniqueness. That seems like the ideal option, to me. -- Katie Bechtold http://hoteldetective.org/ |
From: Kevin <ke...@dr...> - 2006-06-02 17:44:18
|
Katie Bechtold wrote: > On Fri, Jun 02, 2006 at 08:41:18AM -0600, Kevin wrote: >> >> So, we could: >> * do nothing. (that feed really should have unique links) > > I think this is a perfectly valid option and probably a good one if > this kind of brokenness is relatively rare. I really don't know how > rare it is; has anyone else here seen a feed with non-unique links? It happens periodically. I've seen this personally on my slashdot and 'the register' feeds. Evan saw this on a feedburner feed for a blog.=20 The link changes without the content/title changing. >> * add a per-feed option to use title + content hash for uniqueness > > I don't like this option because it relies on the user to recognize > broken feeds that aren't very obvious: the software indicates no > problem in updating them. The user would have to notice that they > haven't seen any new items from that feed in a while and understand > the reason for it. I don't like this either for the same reasons. We should try our best to do the "right thing" without the user's involvement. >> * change everything to use title + content hash for uniqueness. > > That seems like the ideal option, to me. My only concern is: would this make things worse for any feeds? Can anyone provide an example of a feed where it is common for the title+content to be the same across unique items? (ex: link or guid is different, but title+content is identical) --=20 Kevin |
From: Evan R. <eva...@gm...> - 2006-06-02 19:01:05
|
---------- Forwarded message ---------- From: Evan Roth <eva...@gm...> Date: Jun 2, 2006 9:00 PM Subject: Re: [Fofredux-devel] identifying items To: Kevin <ke...@dr...> > Can anyone provide an example of a feed where it is common for the > title+content to be the same across unique items? (ex: link or guid is > different, but title+content is identical) a quick example would be the playlist at last.fm. for example, my playlist you find here: http://ws.audioscrobbler.com/1.0/user/mudhed/recenttracks.rss it contains really just artist / song title with pubDate set to when the song was played. here a snipets from playing the same song twice in a row: <item> <title>AFI - Prelude 12/21</title> <link>http://www.last.fm/music/AFI/_/Prelude%2B12%252F21</link> <pubDate>Fri, 2 Jun 2006 18:58:54 +0000</pubDate> <guid>http://www.last.fm/user/mudhed/#1149274734</guid> <description>http://www.last.fm/music/AFI</description> </item> <item> <title>AFI - Prelude 12/21</title> <link>http://www.last.fm/music/AFI/_/Prelude%2B12%252F21</link> <pubDate>Fri, 2 Jun 2006 18:57:59 +0000</pubDate> <guid>http://www.last.fm/user/mudhed/#1149274679</guid> <description>http://www.last.fm/music/AFI</description> </item> /evan |