From: Wayne G. <ws...@wm...> - 2008-05-21 20:32:30
|
Does anyone have a few marc records with these bad characters in them? Wayne Wayne Graham wrote: > Changed the subject since this seems to be a thread now... > > Ok, are we talking bad UTF-8 or marc-8 characters? What would a bad > UTF-8 character be? Kanji perhaps? If it's a bad UTF-8, but a valid > UTF-16 character, there _shouldn't_ be any conversion needed. > > If we're talking marc-8, the Ansel converter goes from an 8-bit to a > 16-bit character set. If you're going up a character standard, you > _shouldn't_ loose the data. However, if you're going from say UTF-8, to > ASCII, I could see this being a problem. I'm not sure what happens when > you go down this path though (UTF-8 to ASCII to UTF-8). I suspect it'll > work since the unicode ranges are still there...though I've been proven > wrong before in my assumputions ;) > > Wayne > > Andrew Nagy wrote: > >> Well I slightly agree - I like putting the burden on the programmer and make things very easy for the implementer. Especially when the programmer is Wayne :) >> >> While we are on this topic - I have talked with some folks here as well as other libraries and there seems to be a common issue of records that are in utf-8 format but we not fully converted and have records that are ridden with bad utf-8 characters. >> >> Wayne - do you know of any java toolkits that can help cleanup utf-8 data during the import? >> >> Andrew >> >> >> >>> -----Original Message----- >>> From: vuf...@li... [mailto:vufind- >>> gen...@li...] On Behalf Of James Farrugia >>> Sent: Wednesday, May 21, 2008 2:56 PM >>> To: Wayne Graham >>> Cc: vuf...@li... >>> Subject: Re: [VuFind-General] diacritic display -- font problem? >>> >>> Hi Wayne, >>> >>> Thanks. I think the easiest way all around is to put the "burden" of >>> getting records into UTF-8 (which is what VuFind uses/requires, yes?) >>> on users rather than developers. >>> >>> The simple one-line yaz command with -o marc (thanks, Doug) is >>> all that's needed it seems. >>> >>> This seems the best way to deal with it (or some other conversion >>> to UTF-8 before loading into VuFind). >>> >>> Jim >>> >>> >>> >>>>>> O >>>>>> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general > |