Re: [Refdb-users] Author + title info in Greek and transliterated form - how ?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello Markus,

 > You won't be able to do that on a rainy Sunday afternoon.

   Especially since rainy Sunday afternoons are rare these days, at 
least over here ... OTOH, the alternative is to write something from 
scratch to suit my needs. So I think I'll stick to RefDB.

   I thought I'd go step by step, start modestly and add a field (e.g. 
author_real_name) to the author table to hold the author's name in UTF-8 
(Greek, in my case), with the transliterated form in the existing 
t_author.author_name. That is the most pressing need. For the time 
being, I won't care about querying, and I'll keep the titles etc. in the 
original language/encoding for now.

   First step was to patch your sources to work with the new MySQL 
version 4.1 UTF-8 support. It seems to work, titles come back out as 
they went in (unless hexdump is lying to me)..

   First question: Do you care about these patches? If no, skip to next 
question ;-). If yes, how do you want them? I used 0.9.2, but could redo 
them on e.g. a CVS version if you want. There is only one gotcha: it 
seems (?) you have to issue a SET CHARACTER SET <charset> when 
connecting. I hardcoded it to be utf8, which is fine for me, but for 
generic use it should really depend on whatever was specified on 
createdb -E. Oh yes, I didn't patch the docs, either <g>.

  So, next question. I thought I'd extend the syntax of the AU record, 
so I can sneak the Greek version in with add/updateref. Any suggestions 
on how to do that? "I don't care" is a valid answer <g>.

   I've been thinking of the following possibilities:

   a)  AU  - <transliterated> (el) <original>

   b)  AU  - <transliterated> (<original>)

   c)  AU  - <transliterated> | <original>

    With <transliterated> and <original> each having the same syntax as 
they do now.

    In variant a) the "(el)" specifies the language of the <original>. 
This would allow for more than one variant in the future, e.g. 
transliterations into French etc. in the same AU record.

   I guess it all boils down to preventing conflicts with valid stuff in 
real authors' names, and maybe even to being able to reuse the "extended 
RIS" data sets with the new MODS-based version, whenever it materializes 
<g>.

   Last question: I would hack process_ris_set, where it handles the 
AU/Ax/ED records. Split the token, use <transliterated> instead of the 
entire token everywhere, extend the insert. If no <original> part is 
present, I'd insert the <transliterated> in both columns. Any comments ? 
To be clear: I'm just asking for your gut feelings, not for a complete 
analysis, and I won't hold you accountable <g>. Again, "I don't care" is 
a valid answer.

    Best and TIA,

    Luc Pardon
    Skopos Consulting
    Belgium