Re: [Refdb-users] Re: The case against <middlename>
Status: Beta
Brought to you by:
mhoenicka
|
From: Markus H. <mar...@mh...> - 2003-12-20 23:39:40
|
Hi Marc, Marc Herbert writes: > The reason while it would break import from RefMan seems quite obvious > to me: according to this documentation, RefMan does NOT=A0support > so-called "middlenames". > <http://www.refman.com/support/risformat_tags_02.asp> > "For Firstname, you can use full names, initials, or both." > You'll have to look a little closer than that, and maybe get some hands-on experience with these kinds of tools. Middle names are supported implicitly by assuming the first non-lastname is the first name and any other non-lastname is a middle name. This is e.g. very apparent if you look at the RefMan style definitions which support the formatting of last, first, and middle names (using exactly these terms). > > I just wanted to point out that it does not make much sense to me to > > code half-parsed strings in XML when you have to parse anyway. Why not > > go the extra inch and do it right? > > Because the concept of middlenames is not part of any data model > (except risx), but only of some specific _formatting_ needs. > We're running in circles, I guess. These specific formatting needs imply that your data models allows to distinguish the parts of the data which need to be formatted differently. You would never expect the DocBook stylesheets to format a plain text file successfully, but for some reason you expect this for author names given more or less as plain text. > > > The middlename handling and abbreviating stuff is not at your > > discretion. If a style requires these modifications it does not make > > any sense to add a switch that will produce incorrect data. > > Yes, because other styles will require something else. Thus a "switch" > to satisfy all of them. The "--[not]-life-sciences" switch :-) > I don't see your point here. If all non-life sciences applications do not require the distinction between first and middle names, their style specifications will be a little simpler, that's all. > > That is, re-parse the name string each time a query comes in? It > > couldn't come any worse. > > I found very interesting to note that this "so bad" re-parsing is > exactly what happens in _today's_ code, in the case of several > middlenames. Search for "strtok" in: > <http://cvs.sourceforge.net/viewcvs.py/*checkout*/refdb/refdb/src/backend= > -risx.c?content-type=3Dtext%2Fplain&rev=3D1.20> > > I know: you will change this later. But still, it seems to work today. > No, I was talking about the SQL query that tries to match the incoming query against the available datasets. This is currently done against the normalized representation of the full name. No re-parsing happens at this stage as it would grossly affect the performance. Things are a little different if we're talking about generating output from these data. Middle names are currently stored as a list of tokens in a single field. I believe (that is, I didn't run any benchmarks) that tokenizing this list for those backends that actually require this is faster than using an additional table plus joins for all backends, even for those that don't bother. The backends that you'll be using most of the time (scrn or html for locating references) use the normalized representation and hence to not tokenize the middle name list. > > > OK: I suggest one *extremely* simple improvement to this code: the > > > ability to disable it, at least at configure time (I will code this > > > for myself in any case). > > > This does not make sense as it breaks consistent searching and the > > bibliography formatting. > > "Consistent searching" across...=A0different refdb installations !? > Consistent searching across all names. > > > Otherwise this is an example of the beauty of free software. If you > > code this for yourself, everyone can have it his way. > > Sure ! I will, I will... > > Time for a "contrib/" directory ? :-) > I'd be very reluctant to add code to a contrib directory that would not work with the rest of the application. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |