[Refdb-users] RE: reversibility patch (cumulative reply)
Status: Beta
Brought to you by:
mhoenicka
|
From: Markus H. <mar...@mh...> - 2004-01-14 22:43:12
|
Hi, it's getting too tedious for me to answer all ramifications of this discussion in due detail, so I'll try to pick up a few loose ends from the previous mails. This will hopefully settle a few issues. - As a meta-note, I believe much of the confusion in this discussion is derived from the fact that we have a hard time to keep input formats and formatted author names apart. The input format has to supply as much information as possible in a parseable format. The input format is therefore not a free string. The output formats have to stick to the publisher's specifications even if we consider these specs stupid at times. - if the discussion is only about "so-called \"middle names\"" (I prefer to call them what they are: either middlenames or middle initials), then we can indeed get to a short conclusion. At least in the life sciences, the following possibilities to output first and middle names are common practice: FM, F.M., F. M., First M, First M., First Middle, F, F., First This is independent of how the bearer of that name prefers to read his name. A reference manager that wants to support all variants ideally knows "First" and "Middle" (as separate entities) as all other variants can be derived from them. If you can't track down the full names (or can't verify whether "M" is a name as such or an abbreviation), there's not much you can do but use the abbreviations instead. It is quite true that Doris J. Delorie and DJ Delorie end up with the same formatted name if you use the first output style. However, this is not a bug in RefDB, this is a design decision of the publisher requiring that format. I can't argue about this, I just have to support it. The RIS input format is suitable to supply the full names or the abbreviations. It is weak in that it can't distinguish abbreviated names from one-letter non-abbreviated names. It also doesn't support "prime given names" which are not in the first position. These flaws will be addressed by switching over to a MODS-based format. BTW the Pubmed XML format (the output of the largest literature database in the life sciences) uses elements along the lines of "first", "middle", "last", "honorific". - Just like TeX itself, BibTeX has been designed by a mathematician for publications in mathematical journals. It is widely used in mathematics, computer sciences, and engineering. The BibTeX data format is apparently sufficient for publications in these fields. As Bruce pointed out though, we should not use BibTeX as a golden standard. The input format is flawed compared to what XML has to offer. TeX/BibTeX is not accepted by most journals in the life sciences anyway, partly because it does not support the citation and bibliography requirements of these journals. - As far as I understood the ALWD format (a legal citation style asking for the name "exactly as it appears on the front cover or title page") is probably not as flawed as I thought in the first moment. All examples shown in the available docs (I don't own the actual manual, though) use names in the natural order, that is "Franklin D. Roosevelt" or "Luis Lopez Penabad". I think we agree that this is entirely unsuitable as an input format as it is not parseable in any way. We still have to record this formatted string in addition to the parseable data if we want to support ALWD. Needless to say that RIS has no means to do this. A MODS-based input format will have. My conclusions are: - RIS is and remains flawed. There is no point to fiddle with it as you don't gain much but break a lot and lose compatibility with commercial tools. The best strategy is to accept the limitations and treat the current implementation of RefDB as a "compatibility mode". - XML is the way to go, along with an improved data model. Something like the following should be sufficient to handle most names. The following examples assume that you don't have the full information about all name parts and use some abbreviations instead. If you *had* the information, you'd certainly enter "Jessica" instead of "J". <name> <namePart type="primegiven">Doris</namePart> <namePart type="given" abbrev="yes">J</namePart> <namePart type="family">Delorie</namePart> <displayForm>Doris J. Delorie</displayForm> </name> <name> <namePart type="primegiven">DJ</namePart> <namePart type="family">Delorie</namePart> <displayForm>DJ Delorie</displayForm> </name> <name> <namePart type="given" abbrev="yes">H</namePart> <namePart type="given" abbrev="yes">K</namePart> <namePart type="primegiven">Jerry</namePart> <namePart type="family">Chun</namePart> <displayForm>H.K. Jerry Chun</namePart> </name> <name> <namePart type="primegiven">Harry</namePart> <namePart type="given">S</namePart> <namePart type="family">Truman</namePart> <displayForm>Harry S. Truman</displayForm> </name> The displayForm element is used nowhere except in the ALWD style and any other style that wants the name exactly as printed on the cited work (this is not necessarily identical with how the author wants his name printed - the actual string may follow the conventions of the publisher of the cited work rather than the author's preference). This is also why Truman has a dot after his middle non-initial because it was just so spelled on that particular book. Please note also that the parseable data make do without any dots. We'll have to push the MODS people a little in order to support the required attributes. The current MODS implementation is about as flawed as RIS in this respect, but as it is an open standard which is still evolving we have at least a chance to get this fixed. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |