Hi Kevin

Yes I have also noticed that objects with very large version trails can perform a lot slower. I’m testing Fedora 2.2 at the moment (and finding some undocumented API changes). 2.2 now has the ability to turn versioning off per datastream (as well as general performance improvements) so either way with Fez and Fedora 2.2 performance should improve. There may be a way we can improve checkExists if that seems like the bottleneck you are having eg make it uses a different (faster) Fedora api call to do the checking.


On 18/4/07 1:06 AM, "Ford, Kevin" <kford@colum.edu> wrote:

Thanks Lynette for the reminder.  I know that with versioning on the Fedora object continues to grow with each update, but I completely forgot about that aspect since one doesn’t readily see the growth (I was looking at the Fedora object through the web interface, not the pure Foxml that can be accessed through the fedora-admin tool).  And, indeed, I had been working over the one record so that Fez does with it what I want it to do. I’ll have to consider the merits of versioning, at least for some datastreams.
This morning, I removed all changes I made to Fez, deleted the Fedora object, recreated the Fedora object, and then imported it to Fez. Page renders quickly, in less than 4 seconds.  I reimplemented my changes to Fez and the time difference, before any changes to the Fedora object, is negligible (2-3 tenths of a second).  (Nevertheless, for the time being, I like the idea of looking only for the DC record in the checkExists function because it still seems more efficient to me, but I would welcome any thoughts on the matter from other Fez users.)

From: fez-users-bounces@lists.sourceforge.net [mailto:fez-users-bounces@lists.sourceforge.net] On Behalf Of Lynette Rayle
Sent: Monday, April 16, 2007 2:07 PM
To: fez-users@lists.sourceforge.net
Subject: Re: [Fez-users] checkExists

In a separate Fedora project, we discovered that using versioning slowed down the retrieval process, especially if the control group for the versioned datastream is X (internally managed XML).  If it is
taking 16 seconds for checkExists to complete and you have updated the object many times with versioning on, you might want to weigh the advantages of versioning against the performance enhancement of having versioning turned off.  You can also get a performance boost by using control group M (managed content) for XML content instead of X (internally managed XML).
For any who may not be familiar with how Fedora stores datastreams, the reason this happens is that the object foxml holds the metadata for each datastream version.  In the case of internally managed XML, the object foxml also holds the XML value of each datastream version.  The object foxml can grow quite large if there have been lots of updates to an internally managed XML datastream.

From: fez-users-bounces@lists.sourceforge.net [mailto:fez-users-bounces@lists.sourceforge.net] On Behalf Of Ford, Kevin
Sent: Monday, April 16, 2007 12:52 PM
To: fez-users@lists.sourceforge.net
Subject: [Fez-users] checkExists

Dear All,
I wanted to share some wisdom about something I just completed working through this morning.  This has to do with the time it takes for the page to render when viewing an item in a collection.
Last week I started implementing VRA Core 4 with Fez, a rather involved and complex XML metadata format.  As the project progressed, viewing an item in the collection that employs VRA Core 4 became slower and slower: 52 seconds for the page to render (I have Fez 1.3 on my workstation -  Pentium D 3.4 GHz with 2 GBs RAM running Win XP, Fedora 2.1.1 and MySQL 5.x are also on my machine).  Viewing a record that has a very simple DC record renders in 2 seconds.
Trawling through the code to see precisely where the time sink was (my hunch was it had to do with dealing the VRA Core XSD and XML datastream), I discovered the multiple calls to the record->checkExists function from the view2.php and class.record.php page (within the getXmlDisplayId function).  Placing checks throughout the code, I noted that every time the checkExists function was called it took about 16 seconds for the function to return a result.  
So, I created an additional variable for the RecordGeneral class to store the result of the checkExists function, still called once at the top of view2.php (variable holding the result after that).  It still takes 16 or so seconds to receive the result from the first (and only) call to checkExists, but the page completes rendering 3 seconds after that initial call to checkExists, in 19 seconds.
Clearly, this is not (necessarily) a typical example. Like I said, an object with a simple DC record renders much, much quicker.  Nevertheless, it seems that the multiple calls to checkExists are unnecessary and negatively impact performance (by more than 30 seconds in this example).
Also, providing there is a PID (and the record exists), a RecordObject is created twice in view2.php.
Hope this may help,


Kevin Ford
Digital Services Specialist
Columbia College Chicago Library
624 S. Michigan Avenue
Chicago, IL 60605
Tel: 312 344 8568
Email: kford@colum.edu

This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.

Fez-users mailing list

Christiaan Kortekaas
Senior Library Systems Programmer
Library Technology Service
The University of Queensland, Australia QLD 4072
Telephone : (+61) (7) 3346 4337