Re: [Refdb-users] Re: The case against <middlename>

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Marc,

Marc Herbert writes:
 > The reason while it would break import from RefMan seems quite obvious
 > to me: according to this documentation, RefMan does NOT=A0support
 > so-called "middlenames".
 > <http://www.refman.com/support/risformat_tags_02.asp>
 > "For Firstname, you can use full names, initials, or both."
 > 

You'll have to look a little closer than that, and maybe get some
hands-on experience with these kinds of tools. Middle names are
supported implicitly by assuming the first non-lastname is the first
name and any other non-lastname is a middle name. This is e.g. very
apparent if you look at the RefMan style definitions which support the
formatting of last, first, and middle names (using exactly these
terms).

 > > I just wanted to point out that it does not make much sense to me to
 > > code half-parsed strings in XML when you have to parse anyway. Why not
 > > go the extra inch and do it right?
 > 
 > Because the concept of middlenames is not part of any data model
 > (except risx), but only of some specific _formatting_ needs.
 > 

We're running in circles, I guess. These specific formatting needs
imply that your data models allows to distinguish the parts of the
data which need to be formatted differently. You would never expect
the DocBook stylesheets to format a plain text file successfully, but
for some reason you expect this for author names given more or less as
plain text.

 > 
 > > The middlename handling and abbreviating stuff is not at your
 > > discretion. If a style requires these modifications it does not make
 > > any sense to add a switch that will produce incorrect data.
 > 
 > Yes, because other styles will require something else. Thus a "switch"
 > to satisfy all of them. The "--[not]-life-sciences" switch :-)
 > 

I don't see your point here. If all non-life sciences applications do
not require the distinction between first and middle names, their
style specifications will be a little simpler, that's all.

 > > That is, re-parse the name string each time a query comes in? It
 > > couldn't come any worse.
 > 
 > I found very interesting to note that this "so bad" re-parsing is
 > exactly what happens in _today's_ code, in the case of several
 > middlenames. Search for "strtok" in:
 > <http://cvs.sourceforge.net/viewcvs.py/*checkout*/refdb/refdb/src/backend=
 > -risx.c?content-type=3Dtext%2Fplain&rev=3D1.20>
 > 
 > I know: you will change this later. But still, it seems to work today.
 > 

No, I was talking about the SQL query that tries to match the incoming
query against the available datasets. This is currently done against
the normalized representation of the full name. No re-parsing happens
at this stage as it would grossly affect the performance.

Things are a little different if we're talking about generating output
from these data. Middle names are currently stored as a list of tokens
in a single field. I believe (that is, I didn't run any benchmarks)
that tokenizing this list for those backends that actually require
this is faster than using an additional table plus joins for all
backends, even for those that don't bother. The backends that you'll
be using most of the time (scrn or html for locating references) use
the normalized representation and hence to not tokenize the middle
name list.

 > >  > OK: I suggest one *extremely* simple improvement to this code: the
 > >  > ability to disable it, at least at configure time (I will code this
 > >  > for myself in any case).
 > 
 > > This does not make sense as it breaks consistent searching and the
 > > bibliography formatting.
 > 
 > "Consistent searching" across...=A0different refdb installations !?
 > 

Consistent searching across all names.

 > 
 > > Otherwise this is an example of the beauty of free software. If you
 > > code this for yourself, everyone can have it his way.
 > 
 > Sure ! I will, I will...
 > 
 > Time for a "contrib/" directory ? :-)
 > 

I'd be very reluctant to add code to a contrib directory that would
not work with the rest of the application.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de