#289 pubDate fluttering in RSS2.0 feeds

closed-fixed
1
2014-07-03
2006-03-30
John Goerzen
No

I've noticed a good deal of strangeness in the behavior
of some syndicators that are pulling from my site.
I've tracked the behavior down to the pubDate field.

In this report, I am talking about the main RSS 2.0
feed, not the one that lists comments; that is, the
feed you could find at
http://changelog.complete.org/feeds/index.rss2.

Serendipity is taking the pubDate field to mean "date
of last comment post" rather than "date of post of
article itself." This is not really correct.

In fact, I verified this behavior. I saved off a copy
of the feed, posted a copy, and saved off another copy.
The diff is attached.

Now, as for why this is happening... I don't know. The
article publishing date on the website for the blog
itself remains correct all the time. Only the feed
seems to be impacted.

Just in case, I tried reverting my earlier patch for
the regex patch, and go to a more supported URL scheme,
but that made no difference.

Discussion

1 2 > >> (Page 1 of 2)
  • John Goerzen
    John Goerzen
    2006-03-30

     
    Attachments
  • John Goerzen
    John Goerzen
    2006-03-31

    Logged In: YES
    user_id=491567

    One thing I missed when I looked at that diff was that the
    pubDate field that is fluttering is the channel one and not
    the story one.

    But that makes it stranger, because in the plugin
    configuration for the Syndication plugin, I had already set
    the field "pubDate" to "No". So it shouldn't even be
    including a pubDate at the channel level -- only on the
    story level.

    Funny thing is -- if I go into that same config screen and
    set the pubDate to Yes, then that per-channel pubDate is
    ommitted. So I think there is just a logic error someplace
    that is swapping around this one.

    So that's bug #1.

    Bug #2 is that this item is coming from the last_modified
    field instead of timestamp (as all the other pubDates for
    items are coming from), on line 1000 of plugin_internal.inc.php.

     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    • assigned_to: nobody --> garvinhicking
    • priority: 5 --> 1
    • status: open --> pending
     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    Logged In: YES
    user_id=473563

    This is intentional. It's called "Conditional Get" and is
    required to update, so that comments to an entry showup in
    the wfwComment Feed in RSS Readers.

    You can disablethat updatingbehavior via a max_time config
    option in the serendipity_config.inc.php file,but I do not
    suggestthis, because it will make problems for userswith RSS
    readers thatwant to get the latest comments to your latest
    entries.

    Thanks about that plugin reversion thing, I will look at that!

    Regards,
    Garvin

     
  • John Goerzen
    John Goerzen
    2006-03-31

    Logged In: YES
    user_id=491567

    Hi Garvin,

    Thanks again for your attention.

    I freely admit I have never even heard of WFW feeds
    before... so take this with a grain of salt.

    I'm talking about a plain story feed, not a comment one.
    Besides that overall channel pubDate field, the only other
    one that is being adjusted is slash:comment, which I
    wouldn't think is related to WFW or merits a pubDate change.

    The logic you describe makes good sense when we're talking
    about a comment feed, and it probably the Right Thing there.
    The RSS 2.0 spec isn't terribly specific on this point, but
    it seems to me that bumping the pubDate due to comments is
    not the Right Thing for all the non-comment feeds.

    I wonder if it would be possible to use the modified_date
    for comment feeds and the timestamp for everything else?

    As far as a conditional get goes, I again don't know this
    for certain, but my understanding always was that HTTP
    headers were used for this. By the time you've emitted
    enough XML to give a pubDate, it's probably too late to save
    any work anyway. But anyway, I can see why you could argue
    for using last_modified for the HTTP header logic. But I
    think you could still use timestamp for the channel pubDate
    and still be correct.

     
  • John Goerzen
    John Goerzen
    2006-03-31

    • status: pending --> open
     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    • status: open --> pending
     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    Logged In: YES
    user_id=473563

    Hi!

    The problem actually IS the slash:comment count. If we don't
    push the pubDate element, RSS readers will not fetch the
    latest feed, and thus will not recognize that the comment
    count has increased. RSS Readers like RSS Bandit show
    threaded comments for the plain RSS feed thanks to the
    wfwComment and slash_comment elements. If they were not
    updated, those readers would not see that there is a new
    comment.

    Sadly this handling was done a year ago, where I worked on
    this about 1-2 weeks to get it to work with all sort of
    diffreent RSS readers, online readers, Mac reader, Linux
    readers, so I'm reluctant to change it to introduce borkage
    on any of those readers...

    Best regards,
    Garvin

     
  • John Goerzen
    John Goerzen
    2006-03-31

    Logged In: YES
    user_id=491567

    Garvin, looks like you're right. Disabling the pubDate
    didn't help things.

    I'll send a bug to the Planet folks.

    I still think there is a legitimate bug here on the
    configuration screen though. (And I still think the pubDate
    behavior is incorrect, but your reason for not changing it
    makes good sense, especially since we know this is not
    what's confusing Planet.)

    BTW, it appears that others have seen this problem too:

    http://changelog.complete.org/posts/468-Sorry-for-the-dupes.html#c42099

     
  • John Goerzen
    John Goerzen
    2006-03-31

    • status: pending --> open
     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    • status: open --> open-accepted
     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    Logged In: YES
    user_id=473563

    Hi! Yes, thanks for reminding me about the config issue,
    I'll fix that!

    About the Planet thing:Yes, people like hds (Julian Finn,
    AFAIR) on the IRC have also reported such planet problems,
    but the last time we did track it down to a planet problem...

     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    Logged In: YES
    user_id=473563

    Config option bug fixed in 1.0-beta and 1.1-alpha.

     
  • Garvin Hicking
    Garvin Hicking
    2006-03-31

    • status: open-accepted --> pending-fixed
     
  • John Goerzen
    John Goerzen
    2006-04-11

    • status: pending-fixed --> open-fixed
     
  • John Goerzen
    John Goerzen
    2006-04-11

    Logged In: YES
    user_id=491567

    Garvin,

    It looks like there is still a problem here. I'm still
    having trouble putting my finger on it, but ibid over on
    freenode #haskell is looking at planet source.

    It seems that the problem may be related to HTTP headers,
    and Serendipity's processing of them. If you pull up
    http://changelog.complete.org/feeds/index.rss2 with a web
    browser, you will correctly see story 475 first, then story
    474. Story 475 is indeed the most recent, by pubdate and
    modified date.

    But, if I telnet to the server and pass in an
    If-Modified-Since header that would fetch both 475 and 474,
    they appear in the reverse order (see attachment).

    I'm attaching the full dump as fetched by wget (index.rss2)
    as well as the HTTP dump.

     
  • John Goerzen
    John Goerzen
    2006-04-11

    Full RSS feed

     
    Attachments
  • John Goerzen
    John Goerzen
    2006-04-11

    Partial dump from HTTP headers

     
    Attachments
  • Garvin Hicking
    Garvin Hicking
    2006-04-11

    Logged In: YES
    user_id=473563

    Hi!

    Actually, if you specify a If-Modified-Since, the feed is
    ordered by "last_modified" instead of the "created" timestapm.

    This should not make any trouble, because Planet/RSS
    feedreaders have a unique GUID to score an entry for, and
    size does not matter! Uhm, order does not matter ;-)

    Regards,
    Garvin

     
  • John Goerzen
    John Goerzen
    2006-04-12

    Logged In: YES
    user_id=491567

    You're right, it *shouldn't* be a problem, but I have no
    idea what subtle bugs Planet may have ;-)

    Anyway, I think we have tracked the problem down to s9y's
    handling of the If-Modified-Since HTTP header. From looking
    at the rss.php source, I see this:

    $entries = serendipity_fetchEntries(null, true, 15,
    false, (isset($modified_since) ? $modified_since : false),
    'timestamp DESC', '', false, true);

    Now, if the If-Modified-Since header is present, that will
    cause the generated XML feed to contain only the items that
    were modified since the specified date (even if there is no
    Range header).

    Problem is, that's incorrect behavior. According to RFC2616
    section 14.25:

    "If the variant has been modified since the
    If-Modified-Since date, the response is exactly the same as
    for a normal GET."

    That is, the entire file should be sent in that case.

    Planet is sending this header and expecting the full feed back.

    See
    http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25
    for more details.

    I'm attaching last_modified.patch which should correct this.

     
  • John Goerzen
    John Goerzen
    2006-04-12

    Correction for handling of If-Last-Modified

     
    Attachments
  • Garvin Hicking
    Garvin Hicking
    2006-04-12

    Logged In: YES
    user_id=473563

    I might not understand that, but a question:

    If one sends a "If-Modified-Since" with a timestamp, and the
    feed has been updated, he will get the usual 15 Items from
    the RSS feed, right?

    That would be bad, because currently RSS readers would
    receive all Items that have been modified since
    "If-Modified-Since".

    Let's say, I am going to holidays for 4 weeks. Then my
    If-Modified-Since would contain a timestamp 4 weeks ago when
    I first query the feed after my holidays. Let's say, 20
    posts have been created since that time in the RSS feed.
    Current behaviour would return all 20 posts from the blog.

    With your patch, I would only get the 15 posts from that
    Feed, right?

    How to accomodate this situation? Because only fixing
    Planets problematic parsing of the RSS feed is IMHO not
    worth fixing VERY useful behaviour for any usual RSS client. :-)

    A UserAgent based usage would IMHO be favorable? Or maybe
    you understand the RFC better than me and can guide me how
    to make both ways work properly?

    Best regards,
    Garvin

     
  • Garvin Hicking
    Garvin Hicking
    2006-04-12

    • status: open-fixed --> open-accepted
     
  • John Goerzen
    John Goerzen
    2006-04-12

    Logged In: YES
    user_id=491567

    You are correct that, in that scenario of being gone for
    weeks, you could miss some posts.

    However, I don't think this is just a Planet problem -- I've
    also seen some oddity with Google Reader.

    But more importantly, the current S9Y behavior violates the
    HTTP spec. The HTTP RFC2616 says that you either send
    nothing or you send the exact same thing that a person would
    have seen if they had just used GET without an
    If-Modified-Since header. So there is no telling where
    other problems may crop up.

    It is perhaps an unfortunate design flaw of RSS that people
    in this situation could miss some posts on their feeds. Or,
    one could take the opinion that RSS is designed to notify of
    updates, and someone gone that long may not be interested in
    updates from so long ago anyway. In either case, it's not
    S9Y's place to violate the HTTP RFC to address that problem
    with RSS. Not only that, but nobody else is violating the
    HTTP RFC in this manner either.

    According to my take on the HTTP RFC, ther eis no way to
    accomodate the behavior you seek and still be RFC-compliant.

     
  • Garvin Hicking
    Garvin Hicking
    2006-04-13

    Logged In: YES
    user_id=473563

    Hi!

    What does GoogleReader and Planet send as HTTP user agents?
    Then I will implement that as a a disabling of
    if-modified-since. And I'll add a config option to enforce
    RFC compliance, which is set to Off by default, because I
    really enjoy that If-Modified-Since feature on my behalf. :-))

    Best regards,
    Garvin

     
1 2 > >> (Page 1 of 2)