Re: [VuFind-Tech] AlphaBrowse

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Demian,

I'm using vufind 1.3 ... (heavily modified ;-) ) but I think I put all 
back what depends to AlphaBrowse.

Can you tell me, where the items come from .... from the main-index, 
from the authority-index ... or from the sqlite-database.

Greetings

Jochen

Demian Katz schrieb:
> I've seen these sorts of problems when using an AlphaBrowse index built by a different version of the software than the current running version (it relates to some changes in the sorting rules when we introduced ICU).  If you're using an unmodified, out-of-the-box installation, that shouldn't be your problem, though.
>
> Which VuFind version are you using?  Have you customized anything related to AlphaBrowse?
>
> - Demian
> ________________________________________
> From: Jochen Lienhard [lie...@ub...]
> Sent: Thursday, March 28, 2013 6:34 AM
> To: Tod Olson
> Cc: Mark Triggs; Demian Katz; vufind-tech Tech
> Subject: Re: [VuFind-Tech] AlphaBrowse
>
> Hi,
>
> thank you for you answer and the demo.
>
> I still have the problem that the item part of the json answer is empty.
>
> {"responseHeader":{"status":0,"QTime":184},"Browse":{"items":[],"totalCount":0,"startRow":1,"offset":0}}
>
> So .... my browsing starts at the last page and I can browse back.
>
> The sqlite database has entry of the names for example:
>
> key            heading
> tacite         Tacite
>
> I get no errors ... so do you have any idea, why I get not items?
>
> Greetings and happy eastern
>
> Jochen
>
>
> Tod Olson schrieb:
>> I'll try to answer this from my memory of reading the code, and maybe someone can correct me if I go wrong.
>>
>> The path "/browse" calls the BrowseRequestHandler class contained in browse-handler.jar. BrowseRequestHandler looks in the SQLlite database to get the range of the ordered index that will be displayed to the user. For each entry in the returned range, it also the queries the authority core for cross references, like "See Also:". So when you see a result like:
>>
>> http://vfdemo.lib.uchicago.edu/vufind/Alphabrowse/Home?source=author&from=twain
>>
>> you will see the main entries coming from the SQLlite index, supplemented with information from the authority records. (Assuming the demo is running when you look.)
>>
>>
>> -Tod
>>
>>
>> On Mar 27, 2013, at 6:11 AM, Jochen Lienhard <lie...@ub...>
>>    wrote:
>>
>>> Thanks a lot, for all you answers.
>>>
>>> I think SQLite will be OK for us. ;-)
>>>
>>> Now I still have some problems how the search in vufind works.
>>>
>>> If you use the Alphabrowse .... there is a call to the solr-index (biblio) with some parameter:
>>>
>>> for example:
>>> INFO: [katalog] webapp=/solr path=/browse params={from=Goethe&json.nl=arrarr&wt=json&rows=20&source=au&offset=-20} status=0 QTime=25
>>>
>>>
>>> (We changed the name of our index to katalog, and the source field of the Author-Search to au.
>>>
>>> Hmmm ... it find nothing so it jumps to the end of the list.
>>>
>>> My question is now .... what do the /browse call .... why do I need the authority index ...and where is the sqlite database used.
>>>
>>> Must I look in the code to understand or can you explain it to me in some words ;-)
>>>
>>> Thanks for your help and greetings from Germany
>>>
>>> Jochen
>>>
>>>
>>> Mark Triggs schrieb:
>>>> Hi all,
>>>>
>>>> Demian and Tod have neatly summarised why it works the way it does.
>>>> Just adding a few stray comments:
>>>>
>>>> Why use a database at all?
>>>>
>>>> In principle you could build a browse straight off your Solr indexes by
>>>> using the Lucene API directly--by writing a Solr query component that
>>>> opened the bibliographic index and used a Lucene TermEnum to seek to the
>>>> right term of the index being browsed.
>>>>
>>>> The sticking point for this is sorting: Lucene will give you a single
>>>> field in sorted order, but for browse we want to sort by one field (with
>>>> collation, etc.) but display another.  There might be ways of getting
>>>> around this (like prefixing each term with its sort key and stripping it
>>>> off at runtime), but it's hard to beat the simplicity of SQL for this
>>>> (particularly once you throw forwards/backwards pagination into the
>>>> mix).
>>>>
>>>>
>>>> Why SQLite?
>>>>       Mainly because it was convenient at the time, but Tod and Demian have
>>>> mentioned some of the other benefits.  One other benefit of totally
>>>> rebuilding the database for each update is that we can arrange for the
>>>> headings to be sorted at index time (and stored sorted on disk).  That
>>>> way, most of the real work is done when the indexes are built, and the
>>>> sorting overhead at query time is negligible (as long as it doesn't have
>>>> to hit the disk too much... see below :)
>>>>
>>>>
>>>> Startup slowness
>>>>
>>>> Definitely an issue.  When I was using this code in production, a cheap
>>>> trick I pulled was to 'cat' the updated browse DB file right as it was
>>>> made live.  This encourages the OS to cache the file and saves hitting
>>>> the disk too heavily for the first request.  Once everything is in
>>>> memory it's a lot faster.  I wrote the code for an environment that only
>>>> saw updates once every 24 hours, so the one-off slowness wasn't a big
>>>> problem.
>>>>
>>>> Cheers,
>>>>
>>>> Mark
>>>>
>>>>
>>>> Tod Olson <to...@uc...> writes:
>>>>
>>>>> My TODO list includes some more alpha browse work, so I'll add a few things that I have discovered:
>>>>>
>>>>> Relational databases are really good for ordered indexes, in a way that text indexing systems like Solr/Lucene. Completely different models for different tasks.
>>>>>
>>>>> If you try to switch to Postgres, you may find that you need to re-write some of the SQL. But you would also need to rework how a new index is created and swapped in. With SQLlite, there is a separate database file for each index. Do an "ls solr/alphabetical_browse/" to see what I mean. Every time you run index-alphabetic-browse.sh, a new copy of, for example, title_browse.db-updated is created is a different space. If there are no errors it is moved into solr/alphabetical_browse/ and a file called title_browse.db-ready is created. The next time someone queries the browse, the browse-handler notices that there is a new title_browse.db, and swaps it in.
>>>>>
>>>>> The downsides to this are that updating the browse indexes is currently tightly coupled to SQLlite, and the first read of the new index will be slow as the file must be read. The upside is that there is absolutely no DB administrative overhead. No new DB to set up or tables to administer or database to optimize, ever.
>>>>>
>>>>> I do know what you mean about slowness, at least on the initial read of a new browse index. Our full-scale test has 10GB of alphabetic browse indexes, and title_browse.db is 4.9GB. So the first query always times out because it takes awhile for the initial read. But we've found that queries after that are fast. So the solution would be to re-create the indexes at a low-use time and automatically send a warming query to the browse-handler. Which we will also want to do for Solr indexes.
>>>>>
>>>>> I'm sort of interested in adding MySQL support for AlphaBrowse, but it's not a high priority yet. It's not clear when or if the benefits would be worthwhile.
>>>>>
>>>>> Best,
>>>>>
>>>>> -Tod
>>>>>
>>>>> Tod Olson <to...@uc...>
>>>>> Systems Librarian
>>>>> University of Chicago Library
>>>>>
>>>>>
>>>>>
>>>>> On Mar 25, 2013, at 7:55 AM, Demian Katz <dem...@vi...>
>>>>>    wrote:
>>>>>
>>>>>> The AlphaBrowse feature consists of two parts: an indexer that builds a database, and a Solr request handler that looks up results within the database.  The database was chosen as an easy way of accessing an arbitrary point within a pageable list, and SQLite was specifically selected as a lightweight, stand-alone option.
>>>>>>
>>>>>> I suspect that the code could be adjusted to use a different database by changing the JDBC calls and recompiling; you can find the source and documentation here:
>>>>>>
>>>>>> https://github.com/marktriggs/nla-browse-handler
>>>>>>
>>>>>> I'm also copying Mark Triggs (the author of the handler) on this email in case he has any additional comments...  though he's busy working on the ArchivesSpace project right now and probably hasn't thought about this thing in at least a few months!
>>>>>>
>>>>>> - Demian
>>> --
>>> Dr. rer. nat. Jochen Lienhard
>>> Dezernat EDV
>>>
>>> Albert-Ludwigs-Universität Freiburg
>>> Universitätsbibliothek
>>> Rempartstr. 10-16  | Postfach 1629
>>> 79098 Freiburg     | 79016 Freiburg
>>>
>>> Telefon: +49 761 203-3908
>>> E-Mail: lie...@ub...
>>> Internet: www.ub.uni-freiburg.de
>>>
>>
>
> --
> Dr. rer. nat. Jochen Lienhard
> Dezernat EDV
>
> Albert-Ludwigs-Universität Freiburg
> Universitätsbibliothek
> Rempartstr. 10-16  | Postfach 1629
> 79098 Freiburg     | 79016 Freiburg
>
> Telefon: +49 761 203-3908
> E-Mail: lie...@ub...
> Internet: www.ub.uni-freiburg.de
>
>

-- 
Dr. rer. nat. Jochen Lienhard
Dezernat EDV

Albert-Ludwigs-Universität Freiburg
Universitätsbibliothek
Rempartstr. 10-16  | Postfach 1629
79098 Freiburg     | 79016 Freiburg

Telefon: +49 761 203-3908
E-Mail: lie...@ub...
Internet: www.ub.uni-freiburg.de