From: Wayne G. <ws...@wm...> - 2008-06-13 14:57:49
|
-------- Original Message -------- Subject: Re: [VuFind-General] "Cleaning" MARC files for use with java importer (was Re: diacritic display -- font problem?) Date: Fri, 13 Jun 2008 10:50:57 -0400 From: Wayne Graham <ws...@wm...> To: Chris Delis <ce...@ui...> References: <483...@dr...> <483...@wm...> <483...@dr...> <294949F5411DFC418D1DEC84BE825AE14AB0BE1D34@VUEX2.vuad.villanova.edu> <4FC...@st...> <200...@ui...> <7d7...@ma...> <200...@ui...> <485...@wm...> <200...@ui...> <200...@ui...> Doesn't look like the docs are up to date...of course, it was just committed two days ago ;) In your vufind-carli.properties, it should read like this: id = first, 001 collection = constant, Catalog ... I'll update the vufind.properties file on the project right now. Wayne Chris Delis wrote: > I should have attached the two property files, at least :-) > > Chris > > On Fri, Jun 13, 2008 at 09:31:21AM -0500, Chris Delis wrote: > >> Just thought I'd give you my initial impression with the solrmarc >> updater. >> >> I followed the instructions but it appears that I am missing something >> fundamental, given the following error messages: >> >> vufind-devel ~/solrmarc-read-only% ./index_file.sh >> /vufind-data/marcdata/ISUtest.mrc import-carli.properties >> java -Xmx1024M -Dmarc.path=/vufind-data/marcdata/ISUtest.mrc >> -Dsolr.optimize_at_end=false -Dsolr.hosturl= -jar >> dist/MarcImporter.jar import-carli.properties >> Loading properties from import-carli.properties >> Error: Unable to find file containing specified translation map (all, >> 650y:651y)Error: Unable to find file containing specified translation >> map (std) >> Error: Unable to find file containing specified translation map >> (custom) >> Error: Unable to find file containing specified translation map (std) >> Error: Unable to find file containing specified translation map >> (custom) >> Error: Unable to find file containing specified translation map >> (constant) >> Error: Specified translation map (constant) not found in properties >> file >> Error: Specified translation map (custom) not found in properties file >> Error: Specified translation map (custom) not found in properties file >> Error: Specified translation map (all, 650y:651y) not found in >> properties file >> Error: Specified translation map (std) not found in properties file >> Error: Specified translation map (std) not found in properties file >> Unable to find Custom indexer: VuFindIndexer >> Using default SolrIndexer with properties file: >> vufind-carli.properties >> Error: Unable to find file containing specified translation map (all, >> 650y:651y)Error: Unable to find file containing specified translation >> map (std) >> Error: Unable to find file containing specified translation map >> (custom) >> Error: Unable to find file containing specified translation map (std) >> Error: Unable to find file containing specified translation map >> (custom) >> Error: Unable to find file containing specified translation map >> (constant) >> Error: Specified translation map (constant) not found in properties >> file >> Error: Specified translation map (custom) not found in properties file >> Error: Specified translation map (custom) not found in properties file >> Error: Specified translation map (all, 650y:651y) not found in >> properties file >> Error: Specified translation map (std) not found in properties file >> Error: Specified translation map (std) not found in properties file >> Error configuring Indexer from properties file. Exiting... >> >> The only thing I changed was: >> >> - removed a couple solr indexes (since I'm running on an older version >> of VuFind) >> >> - removed the ^Ms since I'm on unix >> >> - changed from the blacklight indexer to vufindindexer >> >> I was wondering if you had any "gotcha" ideas before I start delving >> into code. I was hoping to get a proof-of-concept index built before >> having to go to this step, but oh well :) >> >> Thanks, >> Chris >> >> >> >> On Fri, Jun 13, 2008 at 08:48:45AM -0400, Wayne Graham wrote: >> >>> Well, the code isn't really new. It's what I wrote for this project with >>> all the hard coded stuff configurable via a properties file (which I >>> should have done in the first place). The really nice thing about this >>> new code base is that it just uses a .properties file to map marc >>> elements to solr elements (along with a lot of other mappings)...so >>> changes are much easier! If you want to add your own methods to call, >>> there is a facility to do this (though that requires a recompile). As >>> long as you remember your changes, you should be good to go ;) >>> >>> Wayne >>> >>> Chris Delis wrote: >>> >>>> Thanks, Wayne, >>>> >>>> I may have to just give the solrmarc project a try. I'm not sure if >>>> it'd be easier using the overridden marc4j libraries in lieu of the >>>> original, or if it'd be easier just going all-out with solrmarc even >>>> though it's really "young" (I made quite a bit of changes to the >>>> original Java importer vis a vis marc -> solr field mapping). We'll >>>> see :) I'm just glad that this project exists! I'm sure once this >>>> project matures, it will make all of this easy as pie. >>>> >>>> Chris >>>> >>>> >>>> On Thu, Jun 12, 2008 at 07:24:53PM -0400, Wayne Graham wrote: >>>> >>>> >>>>> Hi Chris, >>>>> >>>>> How pressed are you for this? The reason I mention this is that with the >>>>> solrmarc project there area a few patches added into the marc4j library >>>>> that >>>>> do a lot better job of guessing what the actual record is written in, >>>>> rather >>>>> than what the record reports itself as (and hopefully produce better >>>>> results). There is some committed code in the solrmarc project, I just >>>>> haven't had time (yet) to pull them into the Vufind trunk. Looking at my >>>>> schedule, the code probably won't be pulled into Vufind until July, but >>>>> you >>>>> may want to grab that code on your own and test (and if you do, please let >>>>> me know how it goes). >>>>> >>>>> http://code.google.com/p/solrmarc/ >>>>> >>>>> Wayne >>>>> >>>>> On Thu, Jun 12, 2008 at 6:17 PM, Chris Delis <ce...@ui...> >>>>> wrote: >>>>> >>>>> >>>>> >>>>>> Hello all, >>>>>> >>>>>> Are there any Voyager customers out there using Voyager's marcexport >>>>>> tool along with the java importer? If so, are you exporting as MARC21 >>>>>> MARC-8? And how are you "cleaning" your marc records, if at all? I >>>>>> am having trouble getting the ISOLatin1Filter to work properly in SOLR >>>>>> and am guessing the problem may have to do with a bad encoding >>>>>> somewhere. Are there any good tools (which can run in a batch on a >>>>>> *nix system) someone can recommend? Or is it just better to translate >>>>>> (via yaz-marcdump or whatever) to MARCXML and modify the java importer >>>>>> to read MARCXML? >>>>>> >>>>>> Thanks! >>>>>> Chris >>>>>> >>>>>> On Wed, May 21, 2008 at 02:29:36PM -0700, Naomi Dushay wrote: >>>>>> >>>>>> >>>>>>> There is a C "utf8conditioner" program available at the OAI-PMH web >>>>>>> site (look under "tools"). It changes bad UTF-8 characters to a >>>>>>> benign (but unmeaningful) character. The program comes with test >>>>>>> files with bad UTF-8 characters. >>>>>>> >>>>>>> When I worked for the National Science Digital Library, we harvested >>>>>>> OAI data that had bad UTF-8 chars. It was fairly common. >>>>>>> >>>>>>> The multi-byte UTF-8 characters tend to be particularly thorny, as I >>>>>>> recall. >>>>>>> >>>>>>> - Naomi >>>>>>> >>>>>>> On May 21, 2008, at 1:13 PM, Andrew Nagy wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Well I slightly agree - I like putting the burden on the programmer >>>>>>>> and make things very easy for the implementer. Especially when the >>>>>>>> programmer is Wayne :) >>>>>>>> >>>>>>>> While we are on this topic - I have talked with some folks here as >>>>>>>> well as other libraries and there seems to be a common issue of >>>>>>>> records that are in utf-8 format but we not fully converted and have >>>>>>>> records that are ridden with bad utf-8 characters. >>>>>>>> >>>>>>>> Wayne - do you know of any java toolkits that can help cleanup utf-8 >>>>>>>> data during the import? >>>>>>>> >>>>>>>> Andrew >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: vuf...@li... [mailto:vufind- >>>>>>>>> gen...@li...] On Behalf Of James Farrugia >>>>>>>>> Sent: Wednesday, May 21, 2008 2:56 PM >>>>>>>>> To: Wayne Graham >>>>>>>>> Cc: vuf...@li... >>>>>>>>> Subject: Re: [VuFind-General] diacritic display -- font problem? >>>>>>>>> >>>>>>>>> Hi Wayne, >>>>>>>>> >>>>>>>>> Thanks. I think the easiest way all around is to put the "burden" of >>>>>>>>> getting records into UTF-8 (which is what VuFind uses/requires, yes?) >>>>>>>>> on users rather than developers. >>>>>>>>> >>>>>>>>> The simple one-line yaz command with -o marc (thanks, Doug) is >>>>>>>>> all that's needed it seems. >>>>>>>>> >>>>>>>>> This seems the best way to deal with it (or some other conversion >>>>>>>>> to UTF-8 before loading into VuFind). >>>>>>>>> >>>>>>>>> Jim >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>>> On 5/21/2008 at 2:01 PM, Wayne Graham <ws...@wm...> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> Not sure if this will answer you question, but here it goes. >>>>>>>>>> >>>>>>>>>> The Java that does the indexing has several converters for different >>>>>>>>>> >>>>>>>>>> formats . These include Ansel, ISO5426 (Latin), and ISO 6937 >>>>>>>>>> (ASCII). >>>>>>>>>> >>>>>>>>>> The Ansel converter will convert to- and from- the MARC-8 format. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Right >>>>>>>>> >>>>>>>>> >>>>>>>>>> now the code to do the indexing doesn't do any conversion... is this >>>>>>>>>> >>>>>>>>>> something you need? If so, we can do an enhancement request. >>>>>>>>>> >>>>>>>>>> If you're asking about UTF-8, this is a slightly different answer. >>>>>>>>>> By >>>>>>>>>> >>>>>>>>>> virtue that it's Java, String objects are stored in UTF-16. I can't >>>>>>>>>> really think of a reason to do the extra programming to make it >>>>>>>>>> >>>>>>>>>> >>>>>>>>> UTF-8... >>>>>>>>> >>>>>>>>> >>>>>>>>>> Wayne >>>>>>>>>> >>>>>>>>>> James Farrugia wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Andrew, >>>>>>>>>>> >>>>>>>>>>> Does VuFind offer a MARC to UTF-8 converter? >>>>>>>>>>> >>>>>>>>>>> Jim >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> On 5/21/2008 at 1:39 PM, Andrew Nagy <and...@vi...> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I just changed the CSS for vufind to no longer use Lucida Grande >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> as >>>>>>>>> >>>>>>>>> >>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> default font due to the diacritics issues, the default is now >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> Arial >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Unicode >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> MS, Arial, Sans-Serif. >>>>>>>>>>>> >>>>>>>>>>>> Arial Unicode MS is one of the most unicode compliant fonts and is >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> installed >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> with windows and OSX 10.5 or later. >>>>>>>>>>>> >>>>>>>>>>>> Andrew >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: vuf...@li... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> [mailto:vufind- >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> gen...@li...] On Behalf Of Corinna >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> Baksik >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> Sent: Wednesday, May 21, 2008 1:16 PM >>>>>>>>>>>>> To: vuf...@li... >>>>>>>>>>>>> Subject: [VuFind-General] diacritic display -- font problem? >>>>>>>>>>>>> >>>>>>>>>>>>> Hi - It seems that diacritical marks are not displaying properly. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> The >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> accent displays over the letter to the right of where it should. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> I >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> think >>>>>>>>>>>>> this is a font problem as I can save an html page and use a >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> different >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> font >>>>>>>>>>>>> and it displays correctly. For example, in this record the accent >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> over >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> the >>>>>>>>>>>>> first e in Bibliothèque is displaying over the q: >>>>>>>>>>>>> http://vufind.org/demo/Record/243957 >>>>>>>>>>>>> >>>>>>>>>>>>> This happens consistently for different types of accents and >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> different >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> letters. I suspect that the source record is in decomposed >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> Unicode, >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> otherwise it might display properly. We use Arial Unicode MS in >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> our >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> catalog >>>>>>>>>>>>> because it displays the most number of diacritics and non-Latin >>>>>>>>>>>>> characters >>>>>>>>>>>>> properly (though it is not without bugs). >>>>>>>>>>>>> >>>>>>>>>>>>> corinna >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Corinna Baksik >>>>>>>>>>>>> Harvard University Library >>>>>>>>>>>>> Office for Information Systems >>>>>>>>>>>>> 90 Mt. Auburn St >>>>>>>>>>>>> Cambridge, MA 02138 >>>>>>>>>>>>> >>>>>>>>>>>>> Phone: 617-495-3724 >>>>>>>>>>>>> Fax: 617-496-5600 >>>>>>>>>>>>> Email: cor...@ha... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>> ----------------------------------------------------------------------- >>>>>> >>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> VuFind-General mailing list >>>>>>>>>>>>> VuF...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>> ----------------------------------------------------------------------- >>>>>> >>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> VuFind-General mailing list >>>>>>>>>>>> VuF...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> ----------------------------------------------------------------------- >>>>>> >>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> VuFind-General mailing list >>>>>>>>>>> VuF...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>> ----------------------------------------------------------------------- >>>>>> >>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> VuFind-General mailing list >>>>>>>>>> VuF...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> ----------------------------------------------------------------------- >>>>>> >>>>>> >>>>>>>>> -- >>>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>>> _______________________________________________ >>>>>>>>> VuFind-General mailing list >>>>>>>>> VuF...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> ------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>> _______________________________________________ >>>>>>>> VuFind-General mailing list >>>>>>>> VuF...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>>>> >>>>>>>> >>>>>>> Naomi Dushay >>>>>>> nd...@st... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>> _______________________________________________ >>>>>>> VuFind-General mailing list >>>>>>> VuF...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>>> >>>>>>> >>>>>> ------------------------------------------------------------------------- >>>>>> Check out the new SourceForge.net Marketplace. >>>>>> It's the best place to buy or sell services for >>>>>> just about anything Open Source. >>>>>> http://sourceforge.net/services/buy/index.php >>>>>> _______________________________________________ >>>>>> VuFind-General mailing list >>>>>> VuF...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/vufind-general >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> PJ O'Rourke - "You can't get rid of poverty by giving people money." >>>>> >>>>> >>>> >>>> |