Naomi - Before you get started on any code, make sure to join
both the Jangle and the DLF ILS-DI list servs.
There was a posting on the DLF ILS-DI listserv from Godmar Back
who developed an implementation of the GetRecord method:
[mailto:email@example.com] On Behalf Of Naomi
Sent: Monday, June 30, 2008 12:28 PM
Subject: Re: [VuFind-Tech] web service to move information from ILS to
oh ho! We've hit the inevitable "put up or shut
up" moment! ;-)
My sleeves are rolled up, I'm back from my travels. We
have an early sep. deadline. I should be contributing soon!
On Jun 30, 2008, at 8:46 AM, Andrew Nagy wrote:
Naomi - I would say that this is for the vufind-unicorn
list. Each ILS works differently from this aspect. With voyager, I
can simply request from the command line all bib records that have been touched
in the past 24 hours - same with holdings, etc.
I say build the rest based system for unicorn - and then we can
port it to jangle so that it will work with all ILSs and we would have a
standard way of getting bibs and holdings no matter which ILS we use.
Here's another set of thorny
questions and ideas from yours truly. As always, I apologize for the
really long posting.
Like everyone else, our bib and
holdings data is changing all the time. We need nightly updates to our
solr/lucene index to keep vufind synched with the latest greatest information
from our ILS. We have too much data to do a full reload nightly, but our
data has significant new records on a daily basis that need to be added.
We also dread, yet expect to reload the index from scratch occasionally.
We believe a web service is the
best way to facilitate these loads.
We have 5 million records, so
just doing one giant request / response is NOT going to work. As it
is we've "chunked" our *static* marc21 bib load into several
500K sections of marc21 records. I'm not sure how big the nightly
update loads will be yet - perhaps after an initial load, we'll be able to
manage with a few separate HTTP requests. Lots I don't know yet.
What if there was a ReST service
that we could use to pull bib AND relevant holdings information from Unicorn?
What if we could get updates OR a whole load this way?
This would be a way to deal with 1) getting batch loads
of data to create/update documents in the index and 2) getting updates
from an ILS for both bib and holdings rec info.
Such a service would potentially
have utility for all the ILS systems and all the Next Generation Discovery
systems. Certainly for any large collection of bib/holdings data.
The ReST requests could ask for all records, or they could ask for records
updated since a certain date, or records last updated between two dates (just
like OAI-PMH requests).
I'm thinking the responses would look something like this:
[bib record in marcxml]
<item> [item1 holdings info]
<item> [item2 holdings info] </item>
If you don't like xml, we could
have the same service functionality with marc21 records as the responses
(rather than xml). We could even have the holdings info stuck in the
marc fields, just as it is currently.
Requests would be something like
getRecords (beg date)
getRecords (beg date, end date)
getRecords (end date)
The way this could work is that there would be ILS specific code to do the
- select appropriate bib records
- for each selected bib
record, convert it to marcxml, then get/create xml for the related
holdings/item/call number records, and combine these into a "record"
- "chunk" responses into
reasonable lengths and serve out each piece as requested (OAI-PMH has a resumptionToken
mechanism to facilitate this).
Here's something Wayne wrote in
response to a similar post of mine on the vufind-unicorn list:
On Jun 23, 2008, at 7:26 AM, Wayne
I think the only thing I worry
about is scaling may be an issue. This could be a possibility though once we
finish the code to actually post a full marc record to Solr directly (the same
way you post a CSV file). I would think that instead of passing XML back (at
least for these methods), a marc file that could be indexed would make more
I think we're actually talking
about the same thing. The main difference is that you expose these scripts to a
web service, correct? In stead of your Sirsi server pushing these updates, you
pull them from Solr.
Again, I would argue that the
overhead of doing this with large datasets may be a little prohibitive.
However, it doesn't require setting up rsynch or scp routines, which means this
may be very attractive for a lot of folks without big (or coorperative) IT
So, would you like to add
records.get (start_date, end_date,
format) where none of the elements are required; if no start and end date, it
returns everything, and the formats are marcxml and marc (default is marc)?
Leaving the xml vs. marc21 format
issue aside, the key point is to use a web service, with the NGDE requesting
One of the benefits to web
services is that they can be well defined but not be implementation specific
... so it's another way to unify practice.
I'm guessing there are a bunch of
possible performance bottlenecks:
1. getting item recs for each individual bib rec: ameliorate by
requesting items for a list of bib recs instead of one at a time?
2. combining item recs and bib recs into a single "response" or
single "indexing time" unit - this shouldn't be that slow, right?
These are "solved"
3. http responses too big:
break them into chunks, a la OAI-PMH
4. http request/responses slowing up indexing: pull all the
responses first, then index them all as a lump.
other issues / ideas / comments ?