Thread: Re: [VuFind-General] Cleaning" MARC files for use with java importer

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

We use marcexport as utf-8 at Yale without difficulty, but so far have not loaded all 8 million records or most or our non Latin scripts (still in design and customization stage, and re-indexing too often to be worth the time).  We did find one vendor who was sending non-roman characters encoded as "&<charname>;" tags designed to be rendered through stylesheets that had to be cleaned up.

-----------original message -------------------------------------------
Date: Thu, 12 Jun 2008 17:17:55 -0500
From: Chris Delis <ce...@ui...>
Subject: [VuFind-General] "Cleaning" MARC files for use with java
        importer        (was Re: diacritic display -- font problem?)
To: vuf...@li...
Message-ID: <200...@ui...>
Content-Type: text/plain; charset=iso-8859-1

Hello all,

Are there any Voyager customers out there using Voyager's marcexport
tool along with the java importer?  If so, are you exporting as MARC21
MARC-8?  And how are you "cleaning" your marc records, if at all?  I
am having trouble getting the ISOLatin1Filter to work properly in SOLR
and am guessing the problem may have to do with a bad encoding
somewhere.  Are there any good tools (which can run in a batch on a
*nix system) someone can recommend?  Or is it just better to translate
(via yaz-marcdump or whatever) to MARCXML and modify the java importer
to read MARCXML?

Thanks!
Chris

On Wed, May 21, 2008 at 02:29:36PM -0700, Naomi Dushay wrote:
> There is a C "utf8conditioner" program available at the OAI-PMH web
> site (look under "tools").  It changes bad UTF-8 characters to a
> benign (but unmeaningful) character.  The program comes with test
> files with bad UTF-8 characters.
>
> When I worked for the National Science Digital Library, we harvested
> OAI data that had bad UTF-8 chars.  It was fairly common.
>
> The multi-byte UTF-8 characters tend to be particularly thorny, as I
> recall.
>
> - Naomi
>

Thread: Re: [VuFind-General] Cleaning" MARC files for use with java importer

vufind-general