I've been looking for a while for a reference management tool to replace our group's current home-grown script. Refbase fits quite nicely (easily embeddable, few dependencies etc) and I have a server up and running and integrated with our web template. Unfortunately there is one major problem; very slow processing of records with long author lists (where long means several hundred authors). As a member of a large research collaboration, we have quite a lot of these. The issue is bad enough that I'm getting page timeouts (30 seconds) on some page views (of 5 of such records). The system configuration is:
Debian 6 kernel 2.6.32-5-amd64
PHP Version 5.3.3-7+squeeze7
refbase 0.9.5 stable
I've also tried importing some representative records to both the demo.refbase.net and beta.refbase.net databases, and see the same issue - pages with one or more of these records load slowly, while pages with 'normal' (i.e. only a few authors per paper) load very quickly.
Looking at the server load when it's trying to display a long author list, the apache process is running at 100% cpu while waiting for the page. My suspicion is that there's some indexing going on during the processing of the record(s) to allow the nifty auto search completion stuff to work, and it doesn't scale well to fields with hundreds of entries in them.
Another thing which I've noticed that may or may not be related - looking in the database the maximum value for author_count is 3, even on papers which have many more authors. Is this a bug or by design?
Any advice gratefully received…
Per includes/include.inc.php, the authorCount is y design: sole-author papers and papers by two authors get sorted before papers with three-or-more authors, but there hasn't been a need to sort on author count. Is there a reason you'd want to change this field or is the only thing we need to address the performance concerns you first raise?
The page load times I have for http://beta.refbase.net/show.php?date=2012-02-15 are fairly reasonable.
Ok, that makes sense on the author_count number, thought that might be a red herring…
One problem is that my setup currently doesn't output the page progressively, so you are staring at a blank screen for up to 30 s before the list appears. I'll play around with the php buffer settings and see if I can replicate the behaviour of the demo servers - that would at least improve the perceived speed.
We are, of course, interested in performance improvements too. What would you say the biggest hurdle is here? Import? If so, from what format(s)?