I agree this is a short term solution and viable for v1 release.

Faking the date data is bad from my point-of-view (some RSS readers will actually ignore old entries) but so is the short term solution, so I just have to let go and acknowledge this is the best compromise for release.

The underlying authoritative date issue is too big to tackle now and shouldn't hold up proceedings.

RE: OAI-PMH, we will indeed share, but bear in mind I was mostly talking about the layers, not necessarily using OAI-PMH to get them talking together (although it is one of our options).

On 27 April 2010 23:31, Demian Katz <demian.katz@villanova.edu> wrote:

Just to make sure we're on the same page, how is this for a short-term proposal?  We'll add a <pubDate> to all the items in the RSS feed where available, and we'll leave the user-determined sort order alone.  Since we only have a year in VuFind's publication date index, we'll have to fake out the rest -- i.e. "01 Jan XXXX 00:00:00 GMT."  Is that likely to hurt anything?


We can also step up the number of results in the feed -- as you say, displaying EVERYTHING is not likely to be a good idea, but going to 50 instead of 20 is not a big deal (as you noted in the original ticket, one version of the code already did this).


If we make these changes, would you (and everyone else) be content moving VUFIND-167 from the 1.0 list to the Wishlist?  I don't really want to hold up the 1.0 release for this issue, and I don't expect to fully solve the problem by then…  but I want to do as much as possible to improve the situation if it can be done in relatively simple ways.


Regarding the OAI-PMH component, please let the list know if/when you reach the point of needing an OAI-PMH harvester for VuFind.  It's definitely a shared need, and I have a feeling we can benefit by making shared plans and breaking up the work.


- Demian


From: Greg Pendlebury [mailto:greg.pendlebury@gmail.com]
Sent: Monday, April 26, 2010 7:12 PM
To: Demian Katz
Cc: Tuan Nguyen; vufind-tech@lists.sourceforge.net

Subject: Re: [VuFind-Tech] VUFIND-167 (RSS Feature is not really RSS)


RE: Point 2

This sounds like the best short term solution to me. I think even in areas like 'author' feeds the date of publication would reasonably simulate date of index.

The longer term reasons to switch still exist though:
1) Date of index (ie. new items) is also a valid search requirement and sort order we can't currently satisfy.
2) Because we grab only the top 'page' of result sets this approach means new content may not even appear in the RSS feed if the sort order would put it too far down. (Like buying an old book). This point kind of invalidates your examples in 2(a) and 2(b) on further reflection :(

Date of index is the only data I can think of that provides genuine syndication through RSS... unless we take the result limits off the query and pass them all data everytime, but I think OOM issues arise then.

The OAI-PMH solution from your other post sounds like a good longer term option. For the group I'm currently working with that's probably the road we'll go down anyway since our software is going to be that authoritative layer in between. However it's more repository oriented then catalogue.

On 27 April 2010 00:36, Demian Katz <demian.katz@villanova.edu> wrote:

A couple more thoughts and some questions:


1.) We do actually have a "get new item IDs" method on some ILS drivers (possibly only implemented in Voyager).  So theoretically we could use that method, and re-run queries in RSS feeds against recent items.  I see three major problems with that approach, though:

                a.) It's going to be slow.

                b.) It only works for items in the ILS, and VuFind is trying to grow beyind that dependency.

                c.) In many cases, the results will be empty -- if things don't change often, the "new items" list won't go back far enough to match anything.


2.) Another consideration is how sort order fits into the picture here.  There's an assumption that we want to override the user's sort option with a "most-recently-indexed-first" order…  but I'm beginning to question whether that actually helps us.  It makes sense in cases where a user is watching for additions in a fairly narrow area (i.e. a specific author, a rare search term, etc.).  But here are a couple of use cases where it might not help:

                a.) User has "date ascending" sort -- they want to see when old things are added to the collection.

                b.) User has "relevance" sort -- they want to see highly relevant items on a broad topic.  Should we really feed them results that would normally fall on page 100 of their results?


So this leads me to my questions:  has anybody tried to use our existing functionality with an RSS consumer?  If so, how does it actually behave when things change?  Is it possible that smart consumers are already able to detect differences and act correctly?  If not, are there clues we can provide to help them do the job better without having to maintain a whole new sort order in VuFind and possible undermining user intentions in the process?


Apologies if it looks like I'm just trying to wish the whole problem away -- maybe on some level I am -- but I think we may need to think harder about what we want to do before we actually try to do it.


- Demian


From: Greg Pendlebury [mailto:greg.pendlebury@gmail.com]
Sent: Friday, April 23, 2010 8:20 PM
To: Tuan Nguyen
Cc: Demian Katz; vufind-tech@lists.sourceforge.net

Subject: Re: [VuFind-Tech] VUFIND-167 (RSS Feature is not really RSS)


I have no idea on the technical correctness of the idea, but it's pretty far from the expected behaviour for RSS consumers. People would expect individual entries so that their readers and aggregators will tell them what exactly changed, not just that something changed and they are going to have to sift through the results to find it.

Syndication requires timeliness and I can't see a way around that without VuFind becoming authoritative about times, or find an external source. I suspect the easiest way out is to add some sort of syndication/new items interface to the ILMS Driver, and disable the feature in VuFind unless we find that interface.

On 24 April 2010 06:45, Tuan Nguyen <tuan@yorku.ca> wrote:

I'm don't know much about RSS either. The only problem I could see with this is if a record is deleted and another is added, then the number of results will not change and the feed reader won't pick it up. 


On Apr 23, 2010, at 1:36 PM, Demian Katz wrote:


I've just been doing some reading about RSS, which in spite of its ubiquity, I have never read about at the spec-level before.  It's a whole lot simpler (and more simplistic) than I had imagined.  Two things really surprised me:


1.) There is no defined request format.  I had assumed there was a standard protocol for requesting RSS feeds, but (unless I'm missing something big), there really isn't.  It's up to the developer if they want to parameterize their feeds and exactly what they represent in the list.


2.) No date-related information is required in the feed.  Anything dealing with when a resource was created or changed is an optional field.  I had assumed that some kind of date information was required so that feed aggregators could correctly interpolate feeds and detect changes.


These revelations led me to a new thought on how to solve this issue.


Our current solution is to literally render our search results as an RSS feed.  This is problematic because (due to a lack of "last changed" data) we can't order the feed in a way that puts the newest results at the top of the list.  Without some kind of change-based sorting, it's nearly useless as a real RSS feed because it doesn't highlight changes.


However, what if we take things up a level?  Rather than representing the search results as individual items within the RSS feed, what if we represent the search ITSELF as the item?  i.e.:



    <title>Author Search: charles dickens</title>

    <description>Search with 187 results.</description>



I suspect that the most common scenario that will interest a user in returning to a search is discovering that new items matching the terms have arrived.  When this happens, the description will change.  Would that be enough for an aggregator to treat this as a new item?  If so, perhaps this is a simple starting point for more functional RSS -- it's not as glamorous as showing the individual new search results, but it serves as a notification that something has changed, and it can prompt the user to repeat the search.


As I've said, I don't have a lot of experience with RSS, so maybe this is a silly idea or simply won't work.  However, if anyone with a little more knowledge of the topic would care to comment, I would appreciate it!  This is the closest thing I've come up with to a simple solution to this complex problem, and it would be nice if we can make it work.  Trying to solve it from other angles seems to run up against the wall of insufficient input data, and I really don't know how to get around that in a non-institution-specific way.



Vufind-tech mailing list



Vufind-tech mailing list