Re: [Refdb-users] Re: The case against <middlename>
Status: Beta
Brought to you by:
mhoenicka
From: Marc H. <mar...@en...> - 2003-12-19 22:44:30
|
On Thu, 11 Dec 2003, Markus Hoenicka wrote: > Marc Herbert writes: > > I am glad to hear this! Then fix the _automated_ RIS parsing/syntax = by > > adding a comma to it? > Where would you like to have an additional comma? I'd be reluctant to > do this anyway as this would break data import from RefMan and > EndNote, but the RIS syntax uses two commas anyway. One to separate > the last name from the rest, and one to separate the suffix from the > rest. I was suggesting a comma between each firstname or middlename, in order to have (at least) the same middlename data model in both RISX=A0and RIS, and an un-ambiguous RIS=A0syntax. The reason while it would break import from RefMan seems quite obvious to me: according to this documentation, RefMan does NOT=A0support so-called "middlenames". <http://www.refman.com/support/risformat_tags_02.asp> "For Firstname, you can use full names, initials, or both." How do people in life sciences work with RefMan? It would be interesting to know. > > I did not know publishers of 5000 life sciences journals where so > > english-centric and ignorant of foreign cultures. This bug is quite > > amazing. > > > > It's sad but I don't see it as my job to change this. So be happy that: it's absolutely not what I was asking for (see previous messages). > I just wanted to point out that it does not make much sense to me to > code half-parsed strings in XML when you have to parse anyway. Why not > go the extra inch and do it right? Because the concept of middlenames is not part of any data model (except risx), but only of some specific _formatting_ needs. > The middlename handling and abbreviating stuff is not at your > discretion. If a style requires these modifications it does not make > any sense to add a switch that will produce incorrect data. Yes, because other styles will require something else. Thus a "switch" to satisfy all of them. The "--[not]-life-sciences" switch :-) > > No it's not too late: you can also play the same game with dots and > > spaces later at search/formatting time, without subtly and silently > > modifying the data that the user intently input; that is losing > > information really. > That is, re-parse the name string each time a query comes in? It > couldn't come any worse. I found very interesting to note that this "so bad" re-parsing is exactly what happens in _today's_ code, in the case of several middlenames. Search for "strtok" in: <http://cvs.sourceforge.net/viewcvs.py/*checkout*/refdb/refdb/src/backend= -risx.c?content-type=3Dtext%2Fplain&rev=3D1.20> I know: you will change this later. But still, it seems to work today. > > Please do never silently and subtly modify user data. At least ask f= or > > confirmation! The real world is too complex for any "clever" names > > standardization algorithm. > > I'll be happy to add a section to the docs in all caps and a red box > around it stating that author names will be normalized for the sake of > consistency. Thanks in advance! (I consider this a minimum before modifying user data). > > OK: I suggest one *extremely* simple improvement to this code: the > > ability to disable it, at least at configure time (I will code this > > for myself in any case). > This does not make sense as it breaks consistent searching and the > bibliography formatting. "Consistent searching" across...=A0different refdb installations !? > Otherwise this is an example of the beauty of free software. If you > code this for yourself, everyone can have it his way. Sure ! I will, I will... Time for a "contrib/" directory ? :-) Cheers, Marc. |