Re: [VuFind-Tech] [VuFind-General] vufind requirements

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello,

I'll have to admit that I don't know a lot about Jangle. I'm willing to
read when need be. 

I'm more concerned that Jangle might be something like a Z39.50
protocol, something that works great and helps share resources, but you
end up watering down the functionality to the lowest common denominator.
I've added a lot of functionality to the Aleph driver, some things that
don't even exist in the other drivers in VuFind, as we have ILL in our
ILS also. 

By using Jangle are we going to water down what can be available? If
someone writes code to interface with their ILS that is not part of
Jangle how are we going to write around it? How large is this thing
going to be? The questions on it are limitless because you need to
include any and everything as far as accessing your ILS.

As an aside, the course reserves interface I wrote deals with a separate
database for each institution. Its not that hard to keep track of where
to go. When I bring in the records I tag the bib ID with a code. When I
get the bib ID in VuFind I use the tag to decide where to go with the
driver.

I will admit I'm skeptical of Jangle. It sounds like a solution looking
for a problem. I'm willing to listen though.

al

ps. Please don't get the hair up on the back of your neck, I'm just
looking for more information.

On Tue, 2010-02-09 at 08:55 -0500, Demian Katz wrote:
> Hopefully Ross can join in and give us some thoughts on Jangle's
> applicability to this situation.  If it could help us out here, that's
> certainly another mark in its favor (and possibly extra motivation for
> moving to Jangle even in places where we already have strong ILS
> drivers).  One of the biggest challenges of this problem is the fact
> that it gets more complicated as you add more data sources -- it's not
> such a big deal with a single ILS, but if you start writing record
> drivers and pulling in non-MARC data from local resources, figuring
> out what is newest and what has changed since a different date turns
> into a confederated searching problem.  I'm still not totally clear on
> Jangle's capabilities, but is this a scenario where a single Jangle
> Core with multiple Connectors could make life easier?
> 
>  
> 
> - Demian
> 
>  
> 
> From: Greg Pendlebury [mailto:gre...@gm...] 
> Sent: Monday, February 08, 2010 5:28 PM
> To: Demian Katz
> Cc: vuf...@li...
> Subject: Re: [VuFind-Tech] [VuFind-General] vufind requirements
> 
> 
>  
> 
> Perhaps it's a good tie-in to the Jangle discussion? If Jangle was
> used as a layer between VuFind and an ILS then VuFind could rely on
> features from Jangle. Then Jangle becomes the area we need to concern
> ourselves with.
> 
> In part its sweeping the problem under the rug, but (unless I'm
> mistaken) Jangle is designed to provide this sort of common interface
> and feature list into any ILS. The Jangle harvest layer is more
> stateful and authoritative then VuFind's index isn't it?
> 
> I have one of those XC videos stuck in my head, so I could be
> confusing the two. :)
> 
> Greg
> 
> On 9 February 2010 01:57, Demian Katz <dem...@vi...>
> wrote:
> 
> This is definitely a tricky problem, without an obvious easy answer.
> It's only likely to get more complicated with non-MARC record support,
> since this likely means tracking the status of records imported from
> multiple sources using multiple tools.
> 
>  
> 
> It occurs to me that some kind of brute force solution might (and I
> only say might, because I don't think this is one of my better ideas)
> be possible by maintaining a database consisting of record IDs and
> hashes of the fullrecord index.  By analyzing the contents of the Solr
> index and this database, you could detect additions, deletions and
> changes…  BUT the processing involved would probably be prohibitive,
> and the hash idea only works if fullrecord only changes when you want
> it to (i.e. if fullrecord includes an "exported from ILS" date, it
> will appear to change after every full reindex even if it didn't
> really change).
> 
>  
> 
> Really, the problem here is that, as recently discussed, Solr is an
> index and isn't intended for much beyond that.  Unfortunately, I have
> no brilliant ideas regarding a complementary package to handle the
> non-index elements of the problem.
> 
>  
> 
> One other thought -- even outside of OAI-PMH/RSS, it would be useful
> to track deleted records.  Right now, I believe that tags and
> favorites entries will stay around forever even if the associated
> record is removed from the index.  I'm not sure if this has any
> serious side effects (haven't done extensive testing), but it's yet
> another detail that might be good to clean up eventually.
> 
>  
> 
> - Demian
> 
>  
> 
> From: Greg Pendlebury [mailto:gre...@gm...] 
> Sent: Sunday, February 07, 2010 6:13 PM
> To: vuf...@li...
> Subject: Re: [VuFind-Tech] [VuFind-General] vufind requirements
> 
> 
>  
> 
> Relating to one of the earlier points in this thread, I'm happy to
> start some work in the OAI-PMH area. My new job pertains to this
> reasonably closely, and it will be a good chance to get back in the
> saddle with VuFind.
> 
> BUT, there are some larger underlying issues regarding the solr index
> that complicate the matter though.
> 
> eg: http://vufind.org/jira/browse/VUFIND-167
> 
> A proper OAI-PMH server should be able to reliably serve followup
> queries to say what has changed since the last visit. Full deletion
> support also makes you a good OAI-PMH server.
> 
> To reliably support these functions VuFind would need to start keeping
> audit data (effectively) on changes to its solr index. Since most
> (all?) libraries manage their index directly through the backend and
> (as already pointed out) we aren't treating the index as our
> authoritative datastore, I don't think its viable right now. Some
> libraries would probably have the data to support new/update queries,
> but not deletes.
> 
> Ta,
> Greg
> 
> On 5 February 2010 00:48, Reginald Amade (ext) <r....@vm...> wrote:
> 
> Hi Michael,
> 
>  
> 
> Thanks for this.
> 
> I am playing with two thoughts.
> 
> My first thought would be as mentioned in my email: to perform CRUD on
> the Solr  index
> 
> My second thought is to create a kind of aggregate (like ILS). It is
> fed by partner archives and used to populate VuFind.
> 
>  
> 
> It all depends on the complexity of implementation and flexibility
> towards librarians,
> 
>  
> 
> Regards
> 
> Reginald
> 
>  
> 
>                                    
> ______________________________________________________________________
> Van: Michael Beccaria [mailto:mbe...@pa...] 
> Verzonden: woensdag 3 februari 2010 16:43
> Aan: Reginald Amade (ext)
> CC: vuf...@li...
> Onderwerp: RE: [VuFind-General] vufind requirements
> 
> 
>  
> 
> Reginald,
> 
> Forgive me if you already know this, but while the vufind indexer is
> currently set up to ingest Marc records, it’s really not that
> difficult to write an indexer that matches data on the existing fields
> of a vufind install (if you’re familiar with software development of
> some sort). I’m sure there are examples out there of people using solr
> to ingest xml data of all sorts. The key really is to get the data
> into a format that solr supports.
> 
>  
> 
> See here: http://wiki.apache.org/solr/UpdateXmlMessages
> 
>  
> 
> Here is a sample xml from that page that shows the format that solr
> accepts (solr also accepts csv files):
> 
> <add>
> 
>   <doc>
> 
>     <field name="employeeId">05991</field>
> 
>     <field name="office">Bridgewater</field>
> 
>     <field name="skills">Perl</field>
> 
>     <field name="skills">Java</field>
> 
>   </doc>
> 
>   [<doc> ... </doc>[<doc> ... </doc>]]
> 
> </add>
> 
>  
> 
> Then post it to the solr server via http in your program OR using the
> post.jar file that comes with solr. So if your “indexer” can output a
> solr compliant xml file, you can post it using a command line with
> post.jar. The xml file field names need to match the field names found
> in the schema.xml file on the vufind install, then your data will flow
> right in and work. I have done this with a couple of projects. Let me
> know if I can help out.
> 
>  
> 
> And I agree with Demian’s comments (he’s much better at this than I
> am)…just writing to break down the issue a little bit.
> 
>  
> 
> Hope that helps,
> 
>  
> 
> Mike Beccaria
> 
> Systems Librarian
> 
> Head of Digital Initiatives
> 
> Paul Smith's College
> 
> 518.327.6376
> 
> Logo Color.tif
> 
> PSC appreciates your feedback regarding our customer service.  
> 
> Please take a moment  to Tell us your Experience!  
> 
> 
>  
> 
> From: Reginald Amade (ext) [mailto:r....@vm...] 
> Sent: Wednesday, February 03, 2010 10:09 AM
> To: Demian Katz
> Cc: vuf...@li...
> Subject: Re: [VuFind-General] vufind requirements
> 
> 
>  
> 
> You’re absolutely right. I am willing to build something that meets
> our needs. 
> 
> As Analyst/Designer I need to be sure I understand what challenges we
> are facing.
> 
> Let me try to explain our situation.
> 
> We are an organization responsible for environmental  issues. One of
> the projects is to set up an environmental portal where public users
> can easily access stored media. A common model is used where partners
> can participate by making their library available. The procedures used
> so far made synchronisation a tedious and time consuming task with low
> data quality. We have a business case that aims to improve both user
> experience, maintenance and data quality. Our technologies/products of
> choice are VuFind and the OAI-PMH protocol.
> 
>  
> 
> As such our approach is
> 
> 
> 1.      Setting up the harvesting both from a service provider
> perspective as well as the data provider perspective. 
> 
> a.      Given the fact that we are dealing with 25+ data providers and
> 8 identified types,  it is anticipated that we need a solid design in
> order to provide an easy scalable and extensible basis for future
> connection of new data providers
> 
> b.      Automatic harvesting of offered meta data
> 
> 2.      Setting up a VuFind instance that ingests the harvested meta
> data
> 
>  
> 
> Obviously I hope to get some more insight on what I need to do on the
> harvesting side in order to enable VuFind to publish the data. So far
> your feedback is telling me that
> 
>  
> 
> 1.      VuFind can easily ingest metadata in MARCXML format using the
> latest solrMARC tool
> 
> 2.      The previous point suggests that the records are when
> 
> a.      Using an ILS, exported from it
> 
> b.      Not using an ILS, converted directly from oai-dc (eg to
> MARCXML)
> 
>  
> 
> I am wondering what the benefits/drawbacks are of (not)using an ILS?
> Like Voyager.
> 
> Does VuFind/Solr need an ILS? We are not using one!
> 
> In fact VuFind acts as a presentation layer, while Solr is performing
> the hard work under the hood.
> 
> VuFind does not physically store the Solr created index in its MySQL
> database. 
> 
> The MySQL database is used to realize VuFind’s functionality?
> 
>  
> 
> The concern of management. We think that if a partner has a “small”
> volume of anticipated changes/additions we could give them some
> management functionality (in)directly. Ideally this is on the Solr
> index, but maybe generating MARCXML and feeding that to the Solr index
> is the way to go?
> 
>  
> 
> I hope that you can answer my questions or point to resources!
> 
> Thanks
> 
>  
> 
> Reginald
> 
>  
> 
>  
> 
>                                    
> ______________________________________________________________________
> Van: Demian Katz [mailto:dem...@vi...] 
> Verzonden: dinsdag 2 februari 2010 16:17
> Aan: Reginald Amade (ext); vuf...@li...
> Onderwerp: RE: vufind requirements
> 
> 
>  
> 
> As Andrew said, the existing OAI module is probably not suitable for
> your needs, but I'll try to answer your questions on the assumption
> that you're willing to build something new to meet your needs.
> 
>  
> 
> First of all, there is no formal ERD at the moment -- there is some
> documentation in the wiki that gives a higher-level view of VuFind's
> index, but it is largely out of date.  Also, since the schema is not
> fixed and can be easily customized, you may find it easier to look at
> the configuration directly to get a sense of how things work.
> 
>  
> 
> There are three important pieces that tell most of the story:
> 
>  
> 
> 1.) solr/biblio/conf/schema.xml defines all of the fields in VuFind's
> index as well as the data types that may be used to define fields.
> 
> 2.) import/marc.properties defines how MARC fields are mapped to the
> various VuFind indexes.  Obviously, if you're not working with MARC,
> you won't have to worry about these mappings directly, but they are a
> useful reference if you are trying to determine the meaning of a
> particular field.
> 
> 3.) web/conf/searchspecs.yaml defines which indexes are searched and
> how they are used in relevance ranking for each of the search types
> available to the user (all fields, title, subject, etc.)
> 
>  
> 
> Obviously, there's a bit of a learning curve here, and not every
> feature of these files is going to be obvious at first glance…  but
> the general idea shouldn't be too hard to figure out.
> 
>  
> 
> Assuming that you're using the new experimental "record driver" code,
> the other important piece is the record driver, which defines how data
> from the index is presented to the user.  The default record driver
> relies on the index fields for display and may need to be changed if
> you drastically change the indexing…  but you also have the option of
> building your own driver and customizing things however you want.
> 
>  
> 
> As for the workflow, it's fairly straightforward in theory (though I'm
> sure there are some complicated details):
> 
>  
> 
> 1.) Retrieve the record(s).
> 
> 2.) Assign unique but reproducible IDs to records so you can apply
> updates in the future and avoid duplication or ID collisions.
> 
> 3.) Map the record to the fields of the index and post it to Solr.
> 
>  
> 
> As far as how the Solr component fits in, it's really the heart of
> VuFind, performing two critical tasks: storing all of your metadata,
> and providing all of the indexing and search functionality necessary
> to retrieve it.  VuFind adds a convenient layer on top of Solr to help
> you define useful searches and present the data, but Solr does all the
> heavy lifting in the background.
> 
>  
> 
> Finally, regarding management, what functionality did you have in
> mind?  VuFind's administration module offers some very basic options
> (basic delete/view records), and you can do some more functions
> through Solr itself.  However, it's helpful to have a good
> understanding of Solr's role as an index engine.  If you expect it to
> work like a relational database (which I did at first), you may be
> shocked at its limitations…  however, its strengths are elsewhere.  It
> is completely optimized toward doing fast lookups and giving
> information (like faceting) about indexed data.  It's much better than
> a database for searching through vast amounts of data quickly, but it
> simply isn't designed to do some things that a database can.  Most
> significant: once a record is indexed, you can't change it.  You can
> REPLACE it with a new record with the same ID, but you can't update it
> through Solr -- there is no way of adding data to a field or
> performing a global replace.  Your data management needs to happen
> somewhere else; Solr just needs to be fed the latest fully-formed
> versions of records.  Obviously, understanding and working around this
> limitation is important when figuring out your workflows.
> 
>  
> 
> Apologies for the great length of this reply, but I hope it contains
> some of the information you need.  If you still need more, just let me
> know and I'll be happy to elaborate or clarify.
> 
>  
> 
> - Demian
> 
>  
> 
> From: Reginald Amade (ext) [mailto:r....@vm...] 
> Sent: Monday, February 01, 2010 7:47 AM
> To: Demian Katz; vuf...@li...
> Subject: RE: vufind requirements
> 
> 
>  
> 
> Thanks Demian,
> 
>  
> 
> For your clear answer.
> 
> I understand that VuFind
> 
> 1.   most easily ingests data in MARC format: this suggests a
> conversion step OAI-DC to MARC
> 
> 2.   has an OAI module that is incomplete and requires additional
> programming to facilitate data ingestion
> 
>  
> 
> I have some additional questions though…
> 
> What do not understand is how the Solr component fits in?
> 
> What is the flow once data provider records (OAI-DC) are received up
> to saving them in the (VuFind) repository?
> 
> Where can I find an ERD? Is the schema fixed? 
> 
> We would like to have manual management functionality for VuFind, or
> does that come out-of-the-box?
> 
>  
> 
> Thanks for your reply
> 
>  
> 
> Reginald
> 
>  
> 
>  
> 
>  
> 
>  
> 
>                                    
> ______________________________________________________________________
> Van: Demian Katz [mailto:dem...@vi...] 
> Verzonden: maandag 25 januari 2010 15:31
> Aan: Reginald Amade (ext); vuf...@li...
> Onderwerp: RE: vufind requirements
> 
> 
>  
> 
> You are correct -- right now, VuFind is designed to most easily ingest
> data in MARC format.  While VuFind has an OAI module, it is currently
> incomplete (as recently discussed on this list).  I am currently in
> the process of changing the way VuFind deals with records to make it
> more easily compatible with other formats.  If you are planning on
> using the stable RC2 release, your best bet is to harvest the
> metadata, convert it to MARC format (which may well be possible with
> existing tools if you chain them together in the right order), and
> then index it in the standard VuFind fashion.  If you would like to
> work with newer experimental code and are willing to do some coding
> yourself, it might make sense to build on my current work in progress
> and use the OAI-DC data more directly.  For this, you would have to
> write an indexer to parse the OAI-DC records and send them to the Solr
> index (probably the hardest part) and possibly also a record driver so
> VuFind could appropriately display the records (if it's default
> index-based display was not good enough).  Again, let me know if you
> need more details.
> 
>  
> 
> - Demian
> 
>  
> 
> From: Reginald Amade (ext) [mailto:r....@vm...] 
> Sent: Monday, January 25, 2010 9:14 AM
> To: Demian Katz; vuf...@li...
> Subject: RE: vufind requirements
> 
> 
>  
> 
> Hi Demian,
> 
>  
> 
> I’m having trouble understanding how metadata harvested by the Service
> Provider using OAI-PMH can/must be transformed into a format that is
> VuFind compliant. My understanding is that the Data Provider will
> return metadata in oai-dc format. If this is correct then additional
> conversion/transformation needs to be done before VuFind can work with
> the data?
> 
>  
> 
> Does my understanding make any sense?
> 
>  
> 
> Thnx
> 
> Reginald
> 
>  
> 
>                                    
> ______________________________________________________________________
> Van: Demian Katz [mailto:dem...@vi...] 
> Verzonden: maandag 25 januari 2010 14:59
> Aan: Reginald Amade (ext); vuf...@li...
> Onderwerp: RE: vufind requirements
> 
> 
>  
> 
> VuFind was written with MySQL in mind as the back-end database, and
> most installations use MySQL.  However, since the database access goes
> through PEAR's DB_DataObject framework, it shouldn't be too difficult
> to adapt the code to work with a different database.  You would have
> to change the install process to set up the appropriate database
> structure in the other database system, and you would have to change
> the connection string in config.ini to use the appropriate format, but
> it might not take much more than that.
> 
>  
> 
> As for the ILS component, it is definitely not mandatory -- several
> existing VuFind installations are running without an ILS.  You have
> two options here: either edit the templates to disable functionality
> that won't work without an ILS (i.e. real-time availability status,
> some of the "My Account" pages), or write an ILS driver (as specified
> at http://vufind.org/wiki/building_an_ils_driver) to simulate the ILS
> functionality you need.
> 
>  
> 
> I'll be happy to provide more details if you still have questions.
> 
>  
> 
> - Demian
> 
>  
> 
> From: Reginald Amade (ext) [mailto:r....@vm...] 
> Sent: Monday, January 25, 2010 7:32 AM
> To: vuf...@li...
> Subject: [VuFind-General] vufind requirements
> 
> 
>  
> 
> Hi,
> 
>  
> 
> Can anyone tell me what type of database VuFind is compatible with and
> if an ILS component is mandatory?
> 
> Any suggestions or suggested reading are welcome,
> 
>  
> 
> Reginald Amade
> 
>  
> 
>  
> 
> Disclaimer: www.vmm.be/disclaimer
>  
>  
> Kent u onze nieuwsbrief al? www.vmm.be/nieuwsbrief
> Disclaimer: www.vmm.be/disclaimer
>  
>  
> Kent u onze nieuwsbrief al? www.vmm.be/nieuwsbrief
> Disclaimer: www.vmm.be/disclaimer
>  
>  
>  
> Kent u onze nieuwsbrief al? www.vmm.be/nieuwsbrief
> Disclaimer: www.vmm.be/disclaimer
>  
>  
>  
> Kent u onze nieuwsbrief al? www.vmm.be/nieuwsbrief
> Disclaimer: www.vmm.be/disclaimer
>  
> Kent u onze nieuwsbrief al? www.vmm.be/nieuwsbrief
> 
> ------------------------------------------------------------------------------
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the
> business
> Choose flexible plans and management services without long-term
> contracts
> Personal 24x7 support from experience hosting pros just a phone call
> away.
> http://p.sf.net/sfu/theplanet-com
> _______________________________________________
> VuFind-General mailing list
> VuF...@li...
> https://lists.sourceforge.net/lists/listinfo/vufind-general
> 
> 
>  
> 
> 
>  
> 
> 
> ------------------------------------------------------------------------------
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> _______________________________________________
> Vufind-tech mailing list
> Vuf...@li...
> https://lists.sourceforge.net/lists/listinfo/vufind-tech
-- 
Alan Rykhus
PALS, A Program of the Minnesota State Colleges and Universities 
(507)389-1975
ala...@mn...