Share

DSpace

Tracker: Patches

5 RSS Add-on - ID: 1160997
Last Update: Settings changed ( rrodgers )

I'm sorry for the double submission, but something went
wrong in the previous one. I couldn't add the files
anymore and therefore had to resubmit the entire patch

The functionality is explained on
http://wiki.dspace.org/rss

Besides applying the patch:

* The 2 images (rss1.gif and rss2.gif) should be
added to the image folder in jsp.
* The 2 jars (rsslibj.jar and EXML.jar) should be
added to the lib folder. I don't know much about
licenses, but these were open-source projects so i
guess you might need their official licences. You can
find them at:
o http://sourceforge.net/projects/rsslibj
(some bugfxes were performed on this one so upgrading
it could be tricky)
o we can't find the location for exml anymore.
* You can also add a line in dspace.cfg to specify
the directory to store the cached rss files.

# Where to store rss files
rss.dir = /dspace/rss

(Without adding this line, a folder with the
name "rss" will be created within your main dspace
directory.)


Problems or remarks are always welcome

Dspace team at Katholieke Universiteit Leuven,
Ben Bosman
Lieven Droogmans
Bob Vrancken


Ben Bosman ( benbosman ) - 2005-03-10 23:31

5

Closed

Accepted

Richard Rodgers

None

None

Public


Comments ( 2 )

Date: 2005-03-17 21:25
Sender: benbosman

Logged In: YES
user_id=1158626

Thank you for your feedback

Robert:
* The biggest issue seems to be a scaling one. Although
everything works for a small collection, when I try and get
a feed from a large collection (e.g. one with 2,000 items
in) it seem the code tries to iterate through every item in
the collection, which causes an out of memory error. Is
there a reason for this?

Ben:
When a new item has been submitted via the webUI or added
using the itemimporter, the cached version of the rdf file
will be removed.
A new rdf file will be generated when needed. When a user
checks with his rss reader if the feed contains new items,
and there is no cached version of the rdf file, a new rdf
feed will be generated.
Why? Because changing the old rdf file, instead of
generating a new one, would slow down the itemimporter a lot.

The reason for the out of memory error is probably due to
the fact that the rdf file will become too large for
collections with 2,000 items.
It would probably be better to limit the number of items
with the Browse class. We did this for the communities where
only the most recent items are displayed.

If we want to use this technique for collections as well as
communities code changes are required.
We think the best approach would be to allow the dspace
administrator to decide how many items are included in the
rdf (all items or a maximum number).
But this would require quite some changes and also database
changes.

Robert:
* The <link rel="alternate" type="application/rss+xml" ....
Appears in the wrong place in the HTML - <dspace:layout>
writes the whole <HEAD> and a fair chunk of <BODY> before it
gets to your JSP code. Unfortunately this is a little
tricky to deal with. Perhaps we need to add a new tag to
<dspace:layout>, or perhaps a means to add headers that's
along the lines of the <dspace:sidebar> tag.

Ben:
I added this line simply because it enables FireFox to
recognize the rss links.

Robert:
* I renamed org.dspace.storage.rss as org.dspace.rss, as I
don't think it really belongs in the storage layer of the
architecture. True, it does store stuff, but it's an
index/cache like the browse or Lucene indices (used by
org.dspace.search) rather than persistent storage per se. I
think it belongs in the middle layer of our current
architecture
(http://dspace.org/technology/system-docs/architecture.html)
rather than the bottom one. Does that make sense?

Ben:
This indeed makes sense.

Robert:
* Might it be better to have the RSS links appear in a
consistent location on collection + community home pages?
Underneath 'recent submissions' makes sense.

Ben:
The RSS links should indeed be located underneath 'recent
submissions'.

Robert:
* I don't have a good idea of how this RSS feature works --
is there a reason the feeds aren't created 'on the fly'
using, say, the Browse code? Perhaps some documentation
would help?

Ben:
I shall describe the process shortly.
If a more extensive explanation is required, let me know and
I will try to get back to you as soon as possible.

We don't generate each rdf file on the fly because most rss
readers will often try to refresh the rss feed even though
the rdf hasn't changed.
Whenever an item's metadata is changed, an item is added,
moved or deleted, the rdf files that need to be changed,
will be removed. This removal is done by the RSSManager class.

Requests for rss files are handled by the RssServlet class.
This class will simply request the RSSManager for the
required file.
If this file doesn't exist, it will be created, stored and
returned.
If a stored rdf file is present this existing rdf can simply
be returned.

The RSSManager handles the creation of the rdf files.
To collect the required metadata, it will first create a
BrowseScope instance.
Currently a collection returns all items, and a community
will return the items submitted in the last 7 days.
It is probably better to limit the amount of results to
avoid OutOfMemory exceptions, as previously explained.


Dspace team at Katholieke Universiteit Leuven,
Ben Bosman
Lieven Droogmans
Bob Vrancken


Date: 2005-03-17 16:36
Sender: rtansleyProject Admin

Logged In: YES
user_id=166234

The patch applied with no problems, thanks for that!!

I did run into some issues though:

* The biggest issue seems to be a scaling one. Although
everything works for a small collection, when I try and get
a feed from a large collection (e.g. one with 2,000 items
in) it seem the code tries to iterate through every item in
the collection, which causes an out of memory error. Is
there a reason for this?

* The <link rel="alternate" type="application/rss+xml" ....
Appears in the wrong place in the HTML - <dspace:layout>
writes the whole <HEAD> and a fair chunk of <BODY> before it
gets to your JSP code. Unfortunately this is a little
tricky to deal with. Perhaps we need to add a new tag to
<dspace:layout>, or perhaps a means to add headers that's
along the lines of the <dspace:sidebar> tag.


Some minor points:

* I renamed org.dspace.storage.rss as org.dspace.rss, as I
don't think it really belongs in the storage layer of the
architecture. True, it does store stuff, but it's an
index/cache like the browse or Lucene indices (used by
org.dspace.search) rather than persistent storage per se. I
think it belongs in the middle layer of our current
architecture
(http://dspace.org/technology/system-docs/architecture.html)
rather than the bottom one. Does that make sense?

* Might it be better to have the RSS links appear in a
consistent location on collection + community home pages?
Underneath 'recent submissions' makes sense.

* I don't have a good idea of how this RSS feature works --
is there a reason the feeds aren't created 'on the fly'
using, say, the Browse code? Perhaps some documentation
would help?


Attached File ( 1 )

Filename Description Download
rss.tar.gz All files Download

Changes ( 5 )

Field Old Value Date By
status_id Open 2005-10-31 19:54 rrodgers
resolution_id None 2005-10-31 19:54 rrodgers
assigned_to nobody 2005-10-31 19:54 rrodgers
close_date - 2005-10-31 19:54 rrodgers
File Added 125066: rss.tar.gz 2005-03-10 23:31 benbosman