> I don't like caching as a solution because it suggests the existing
> problems with Browse are either too difficult to be solved in short
> order, or are an acceptable shortcoming of DSpace.
I don't follow this. For the Sitemap case specifically, it's not
intended to replace the browse, it's for search engines. Caching
makes sense because multiple search engines will want the same data X,
where X is much more expensive to calculate than a simple O(1) RAM or
> From our
> Architectural review we did glean that the database is already being
> considered a "cache" of metadata about communities,collections,items.
Not right now, but it will be in the future, as I suggested back in 2004-03.
DB access could be made more efficient. The issue is how to balance
having cleanly separate modules and Java objects against allowing
complex joins that make modular separation difficult. In many cases,
I suspect caching will be necessary to maintain performance.
> Is this an artifact of the fact that access to the db of
> multiple requests is stateless, can Resultsets be session level and
> cursored across? Allowing paging through the browse style results to
> be more efficient?
In general, best practice is to re-do the query each time pagination
occurs. Maintaining a ResultSet consumes a DB connection, and given
that how long someone takes on each browse page (or whether they will
proceed at all) is unpredictable we'd rapidly run dry of pooled
> Other points: YACJ (Yet another cron job) to keep everything in sync.
Any server system of reasonable complexity will need batch/cron jobs.
For monitoring, daily subscriptions, cleanup, reporting, all manner of
things. This is a definite requirement on our "service framework".