From: Naomi D. <nd...@st...> - 2008-04-08 00:09:17
|
Andrew and Jim, Thanks for your replies. I've been working with Jim to see if we can get the import-solr.php script working for me. Jim can't commit a lot of time, and I'm new to PHP, so we haven't managed to make it work on Stanford's data yet. A design note on OAI import planning: I wrote a couple iterations of the Lucene based search service for the National Science Digital Library. The NSDL is an aggregator, and it would harvest metadata via OAI and then load it into Lucene (this was in the pre-SOLR days). As you all know, OAI is a metadata transport protocol that runs on top of HTTP. This means the number of metadata records you receive in a single XML file is governed by the size of a "reasonable" HTTP response -- perhaps .5 - 2 MB. (see OAI Best Practices documentation on resumption tokens: http://webservices.itcs.umich.edu/mediawiki/oaibp/index.php/ResumptionTokens) If you want to index large quantities of records, doing it by individual OAI responses may not be the best way. For the NSDL, we "caressed" the OAI files into larger xml documents that were essentially giant OAI responses with thousands of records in them -- number chosen to optimize performance in the NSDL ingest process. Also, OAI can mean many metadata formats. Simple Dublin Core (often referred to as oai_dc) is required .. but OTHER formats are allowed. (see http://webservices.itcs.umich.edu/mediawiki/oaibp/index.php/MultipleMetadataFormats) Stanford is keen on MODS so I'm guessing (I've only been here 2 weeks) the digital repository can/will serve out MODS data via OAI. When an additional format is served, it's nearly always richer than oai_dc, so it's more desirable to index. We're still in the "evaluating options for our next generation discovery environment" stage here, so I can't promise to be a contributor to VuFind at this time. I'll keep you posted on VuFind relevant discoveries here ... - Naomi On Apr 7, 2008, at 11:12 AM, Andrew Nagy wrote: > Naomi - I'm glad to hear that you are playing with VuFind. > Currently we only have an import tool to load MARC data. You can > find more information on how to use this tool in the documentation > section of vufind.org. > > The import tool is a java program that loads a marc dump file. If > you copy your marc dump file to the import directory within vufind > and then run the import.jar file - it will load all of your marc > data. This process still has a bit of fine tuning - which will > hopefully be in the 0.8.1 release coming out soon. > > As to loading mods and mets data - that is not yet in VuFind but we > hope to have that functionally in the not too distant future. The > 1st step in that direction will be our OAI import tool that will > allow you to import any data you might have in an IR or digital > library through OAI. You could also - if you have the expertise - > build an import tool for mods or mets data which shouldn't be a very > large task. > > I hope this helps and feel free to continue to ask questions. > > Andrew > >> -----Original Message----- >> From: vuf...@li... [mailto:vufind-tech- >> bo...@li...] On Behalf Of Naomi Dushay >> Sent: Monday, April 07, 2008 1:31 PM >> To: vuf...@li... >> Subject: [VuFind-Tech] xml into VuFind >> >> Hi folks, >> >> I'm trying to load data into VuFind 0.8; I'm new to VuFind. >> >> We have our MARC data in marcxml. >> We would like to be able to load MODS data. >> We might like to load in other XML data, such as DC or even MEtS. >> >> What is the preferred way to do this? >> >> - Naomi >> >> ----------------------------------------------------------------------- >> -- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Register now and save $200. Hurry, offer ends at 11:59 p.m., >> Monday, April 7! Use priority code J8TLD2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/ >> javaone >> _______________________________________________ >> Vufind-tech mailing list >> Vuf...@li... >> https://lists.sourceforge.net/lists/listinfo/vufind-tech |