[Refdb-devel] [ refdb-Feature Requests-2872243 ] Get IDs fast
Status: Beta
Brought to you by:
mhoenicka
From: SourceForge.net <no...@so...> - 2009-10-04 21:36:40
|
Feature Requests item #2872243, was opened at 2009-10-03 16:36 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Markus Hoenicka (mhoenicka) Summary: Get IDs fast Initial Comment: Currently, it takes 40ms per reference to get the ID of a found reference: $ time refdbc -u refdb -w Sonne -d biblio -C getref -s ID -t ris ":ID:>0" > /dev/null 999:96 retrieved:0 failed real 0m4.026s user 0m0.000s sys 0m0.004s This is problematic for a web frontend because even if you work with aggressive caching, you have to know at least the IDs of found references. Therefore, I request to optimise the ID-only request. ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2009-10-04 23:36 Message: I've tried to track down where refdbd spends its time returning the ID list. Looks like lots of time are wasted doing the client/server messaging as refdbd, by default, returns reference data one dataset at a time. If you return ID lists, which consist of RIS datasets with 4 lines each, the overhead is out of proportion. Please have a look at refdbdgetref.c as of revision 703. There is a tunable at line 2841 which is set to default values according to the type of query a few lines further down. The idea is to group references before sending them to the client. This requires more memory, but reduces the overhead of client/server messaging. I've arrived at values of 100 for ID queries and 10 for other queries empirically, looking only at RIS data. These values certainly depend on the speed and memory of the machine refdbd runs on. Feel free to play with these numbers and see if it helps. If it does, I could turn this into configurable parameters. I've managed to reduce the time for retrieving 100 IDs to 0.732s from 10.66s and the time for retrieving 100 RIS datasets to 3.96s from 12.42s using the current defaults. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091 |