[Refdb-users] Re: The case against <middlename>
Status: Beta
Brought to you by:
mhoenicka
From: Markus H. <mar...@mh...> - 2003-12-10 22:40:03
|
Marc Herbert writes: > Let me reformulate: "lack of detail is better than wrong details". No > information is lost by storing all "given" names in <firstname> and > not parsing them. > You lose the information that a human brain can put into parsing the name string, using cultural background information that is hard if not impossible to teach to a machine. > A style sheet that mandates the use of "middlename" is, to put it > mildly, "culture-specific". If it insists on this, then it should be > able to extract this information _by itself_, and not spoils the > global data model because of this peculiarity. It seems this is > exactly how BibTeX's stylesheets work. References given in a previous > message seem to show that other formats do it the same way. > Once again, go complain to the publishers of roughly 5000 journals in the life sciences. I also believe that your argument is moot that if a style requires the concept of middle names it should be able to retrieve the middle name by itself. With the same argument you could dump entirely unparsed strings in any order onto a bib software and expect it to figure out how to parse it, as it requires to disginguish between given and family names, titles and suffixes. This simply expresses your dislike of middle names. > I think this *requirement* is more or less flawed. The more > reformatting it requires, the more flawed it is, since the more > (wrong) assumptions it will make concerning "name standardization" > (i.e., that everybody should have a name that is american-english > looking). The worst assumption is of course the requirement of a > <middlename>. Assumptions about dots are also flawed, see for > instance: <http://www.delorie.com/users/dj/> > Once again, I didn't invent these requirements. I have to support them if I want to support the 5000+ journals in the life sciences. > In any case, these dirty issues should not spoil the data model, they > should be (and can be!) postponed and solved by the stylesheets > _themselves_. So mistakes appear only in some printings, and there > are no irreversible mistakes in the data source. > I don't think it is a brilliant idea to have each of 700+ stylesheets (if we consider only the life sciences for a moment) parse and munge the names by themselves. Code duplication and bloating would be inevitable. I'd rather have stupid simple stylesheets that use the preparsed names from the application. > The rationale is here: if middlenames should be kept in the data model > (sigh), have at least only simple, perfectly reversible data > transformations in database operations. No dots that magically > appear or disappear, no variable number of tokens, etc. It's always > time to do this at the formatting step. > That's too late as I pointed out elsewhere. You need the normalization when you enter the data into the database to have a consistent and reliable way to search names. > ... and this normalization is too complex to be automated, since > no program can correctly handle all particular cases, thus it should > be manually carried out by operators. > I guess this is already the way it goes in most real cases today? > So if you want to import 100 references that a nice colleague just sent you, you start adding/removing spaces and dots from somewhere between 100 and 1000 author names? Problematic as it may be in border cases, this is a job that *asks* to be automated. If it fails in too many cases, we have to improve the code. > But searching for :AU:=3D"Miller,A.*M.*" will give a pretty good result, > and reveal to the operator the manual normalization work that must be > completed. > This is what a reference manager should avoid at all costs. Why on earth should a user be forced to use regular expressions just to find references by author names? If this is necessary the data model is flawed. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |