Re: [Refdb-users] Re: The case against <middlename>

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 11 Dec 2003, Markus Hoenicka wrote:

> Marc Herbert writes:
>  > I am glad to hear this! Then fix the _automated_ RIS parsing/syntax =
by
>  > adding a comma to it?

> Where would you like to have an additional comma? I'd be reluctant to
> do this anyway as this would break data import from RefMan and
> EndNote, but the RIS syntax uses two commas anyway. One to separate
> the last name from the rest, and one to separate the suffix from the
> rest.

I was suggesting a comma between each firstname or middlename, in
order to have (at least) the same middlename data model in both
RISX=A0and RIS, and an un-ambiguous RIS=A0syntax.

The reason while it would break import from RefMan seems quite obvious
to me: according to this documentation, RefMan does NOT=A0support
so-called "middlenames".
<http://www.refman.com/support/risformat_tags_02.asp>
"For Firstname, you can use full names, initials, or both."

How do people in life sciences work with RefMan? It would be
interesting to know.

>  > I did not know publishers of 5000 life sciences journals where so
>  > english-centric and ignorant of foreign cultures.  This bug is quite
>  > amazing.
>  >
>
> It's sad but I don't see it as my job to change this.

So be happy that: it's absolutely not what I was asking for (see
previous messages).

> I just wanted to point out that it does not make much sense to me to
> code half-parsed strings in XML when you have to parse anyway. Why not
> go the extra inch and do it right?

Because the concept of middlenames is not part of any data model
(except risx), but only of some specific _formatting_ needs.

> The middlename handling and abbreviating stuff is not at your
> discretion. If a style requires these modifications it does not make
> any sense to add a switch that will produce incorrect data.

Yes, because other styles will require something else. Thus a "switch"
to satisfy all of them. The "--[not]-life-sciences" switch :-)

>  > No it's not too late: you can also play the same game with dots and
>  > spaces later at search/formatting time, without subtly and silently
>  > modifying the data that the user intently input; that is losing
>  > information really.

> That is, re-parse the name string each time a query comes in? It
> couldn't come any worse.

I found very interesting to note that this "so bad" re-parsing is
exactly what happens in _today's_ code, in the case of several
middlenames. Search for "strtok" in:
<http://cvs.sourceforge.net/viewcvs.py/*checkout*/refdb/refdb/src/backend=
-risx.c?content-type=3Dtext%2Fplain&rev=3D1.20>

I know: you will change this later. But still, it seems to work today.

>  > Please do never silently and subtly modify user data. At least ask f=
or
>  > confirmation! The real world is too complex for any "clever" names
>  > standardization algorithm.
>
> I'll be happy to add a section to the docs in all caps and a red box
> around it stating that author names will be normalized for the sake of
> consistency.

Thanks in advance! (I consider this a minimum before modifying user
data).

>  > OK: I suggest one *extremely* simple improvement to this code: the
>  > ability to disable it, at least at configure time (I will code this
>  > for myself in any case).

> This does not make sense as it breaks consistent searching and the
> bibliography formatting.

"Consistent searching" across...=A0different refdb installations !?

> Otherwise this is an example of the beauty of free software. If you
> code this for yourself, everyone can have it his way.

Sure ! I will, I will...

Time for a "contrib/" directory ? :-)

Cheers,

Marc.