Re: [Refdb-users] Author + title info in Greek and transliterated form - how ?
Status: Beta
Brought to you by:
mhoenicka
From: Luc P. <re...@sk...> - 2003-07-15 08:58:13
|
Hello Markus, > You won't be able to do that on a rainy Sunday afternoon. Especially since rainy Sunday afternoons are rare these days, at least over here ... OTOH, the alternative is to write something from scratch to suit my needs. So I think I'll stick to RefDB. I thought I'd go step by step, start modestly and add a field (e.g. author_real_name) to the author table to hold the author's name in UTF-8 (Greek, in my case), with the transliterated form in the existing t_author.author_name. That is the most pressing need. For the time being, I won't care about querying, and I'll keep the titles etc. in the original language/encoding for now. First step was to patch your sources to work with the new MySQL version 4.1 UTF-8 support. It seems to work, titles come back out as they went in (unless hexdump is lying to me).. First question: Do you care about these patches? If no, skip to next question ;-). If yes, how do you want them? I used 0.9.2, but could redo them on e.g. a CVS version if you want. There is only one gotcha: it seems (?) you have to issue a SET CHARACTER SET <charset> when connecting. I hardcoded it to be utf8, which is fine for me, but for generic use it should really depend on whatever was specified on createdb -E. Oh yes, I didn't patch the docs, either <g>. So, next question. I thought I'd extend the syntax of the AU record, so I can sneak the Greek version in with add/updateref. Any suggestions on how to do that? "I don't care" is a valid answer <g>. I've been thinking of the following possibilities: a) AU - <transliterated> (el) <original> b) AU - <transliterated> (<original>) c) AU - <transliterated> | <original> With <transliterated> and <original> each having the same syntax as they do now. In variant a) the "(el)" specifies the language of the <original>. This would allow for more than one variant in the future, e.g. transliterations into French etc. in the same AU record. I guess it all boils down to preventing conflicts with valid stuff in real authors' names, and maybe even to being able to reuse the "extended RIS" data sets with the new MODS-based version, whenever it materializes <g>. Last question: I would hack process_ris_set, where it handles the AU/Ax/ED records. Split the token, use <transliterated> instead of the entire token everywhere, extend the insert. If no <original> part is present, I'd insert the <transliterated> in both columns. Any comments ? To be clear: I'm just asking for your gut feelings, not for a complete analysis, and I won't hold you accountable <g>. Again, "I don't care" is a valid answer. Best and TIA, Luc Pardon Skopos Consulting Belgium |