[Refdb-users] "reversibility" patch

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> > I'll be happy to add a section to the docs in all caps and a red box
> > around it stating that author names will be normalized for the sake o=
f
> > consistency.

I could not find this yet in
<http://refdb.sourceforge.net/manual-0.9.4/book1.html>

Explaining "how" they are normalized also seems rather vital to me.

> >  > OK: I suggest one *extremely* simple improvement to this code: the
> >  > ability to disable it, at least at configure time (I will code thi=
s
> >  > for myself in any case).

> > Otherwise this is an example of the beauty of free software. If you
> > code this for yourself, everyone can have it his way.

It's done. See: <http://marc.herbert.free.fr/refdb/reversible/> or
below/attached.

Comments welcome (including from you, Markus :-)

BTW, while testing and comparing, I found some quirks that do not seem
to fit _any_ logic (as opposed to: not fit my taste).

Cheers,

Marc.

  ----------------------------------------------------------

              The "reversible" refdb patch

Marc Herbert
$Date: 2004/01/05 21:30:50 $
$Revision: 1.2 $

                  ---- The issue ----

Currently, refdb tries to "normalize" authors' name inputed in the
database, in order to avoid false duplicates and maybe to cope with
weird requirements of some bibliographic stylesheets. This means
fiddling with full stops and so-called "middlenames".

I think refdb should either reliably perform this normalization
according to a documented, reviewed and formal specification
-- or not at all. Today it does it in an undocumented way,
silently modifying some user data with potential information loss in
corner cases.

This (short and simple) refdb patch disables all modifications of
user-data, and lets the user decide by himself how names should be
"normalized" (assuming it's both desirable and possible).
Thanks to it, what gets _in_ refdb, gets _out_ untouched.
For instance, if you enter "Harry S Truman" in refdb, you would get back:
- without this patch:      "Harry S. Truman"
- with this patch:         "Harry S Truman" (amazing! and "reversible"...=
)

Warning: this patch may or may not break further formatting by some
bibliographic stylesheets, depending if they expect "normalized" names
from the database. I do not care much about breaking stylesheets that
want you to change the way you write your name (probably in a more
"english" way).  I do not mind if they munge names when formatting for
publication, but pushing this "normalization" up to the database is
not acceptable to me. After all, respectful and less rigid formatting
tools also (co-)exist.  The answer to this question is likely to be in
the following function: backend-dbiba.c:format_firstmiddlename()

By the way, be aware that you should NOT use spaces at the beginning
or at the end of RISX <name>(s), since this will lead to false
duplicates in the database _independently from this patch_. On the
other hand, RIS input (AU - field) is more or less space-insensitive.

This patch is compatible with version 0.9.4-pre3, and _not_ with
version 0.9.3. Users (yet...) satisfied with current refdb behaviour
and thus not directly interested by this patch, may still be
interested in understanding how their data is modified; just having a
look at this patch will provide detailed answers. The summary of
changes just below also explains (in english instead of C).

This patch also disables middlename(s) input in the RIS format, due to
a flawed RIS input syntax, and due to their controversial nature (see
http://sourceforge.net/mailarchive/forum.php?forum_id=3D1798&viewmonth=3D=
200312);
all RIS "given names" go together untouched into the "firstname"
database field. On the other hand, RISX <middlename>s are not disabled
by this patch. To disable middlenames in RISX, just... don't use
the tag <middlename>.

            ---- Detailed issues and modifications ----

The SQL database uses 4 (redundant) fields to store author names:
       fullname, lastname, firstname, middlenameS

__________________________
Modifications to RIS input
(i.e., "addref -t ris")

firstname/middlenames parsing is disabled.
- the patch disables fiddling with full stops.
- middlenames are disabled: inside the AU field, the whole "given
name" as delimited by commas, goes into the "firstname" database
field.

                        RIS input examples

                                  Smith,   F.M.N.
                                  Chu,     H.K. Jerry
                                  Truman,  Harry S

                    ->    database results

 official    : "Smith,F.M.N."    "Smith"  "F"          "M N"
 patched     : "Smith,F.M.N."    "Smith"  "F.M.N."

 official    : "Chu,H.K.Jerry"   "Chu"    "H"          "K Jerry "
 patched     : "Chu,H.K.Jerry"   "Chu"    "H.K.Jerry"

 official    : "Truman,Harry S." "Truman" "Harry"      "S "
 patched     : "Truman,Harry S"  "Truman" "Harry S"

(also notice the spurious space ending some middlenames with the
official version).

____________________________
Mmodifications to RISX input
(i.e., "addref -t risx")

- full stops "tricks" are disabled

                          RISX input examples

                                "Smith"   "F."      "M."    "N."
                                "Truman"  "Harry"   "S"
                                "Chu"     "H.K."    "Jerry"

                    ->    database results

 official :  "Smith,F.M.N."     "Smith"   "F"      "M N"
 patched  :  "Smith,F. M. N."   "Smith"   "F."     "M. N."

 official :  "Truman,Harry S."  "Truman"  "Harry"  "S"
 patched  :  "Truman,Harry S"   "Truman"  "Harry"  "S"

 official :  "Chu,H.Jerry"      "Chu"     "H"      "Jerry"    (informatio=
n loss!)
 patched  :  "Chu,H.K. Jerry"   "Chu"     "H.K."   "Jerry"

_______
Outputs

No output expect bibtex's is modified.

RIS output dumps "as is" the first field of the SQL database
(fullname).  RISX output uses the 3 other fields (last, first,
middles). It dumps last and firstname untouched, then parse the
"middlenames" field according to spaces before dumping <middlename>s
elements. The patch does modify neither RIS nor RISX output.
Most other outputs also work one way or the other, and are not
modified by the patch.

However, for some unknown reason, bibtex output pulls the fullname
from the database and parses it again, so a small patch was needed
here again to prevent the addition of full stops.

__________
Convertors

The "nmed2ris" convertor also fiddles with authors' names in a similar
way. I can not yet say more about this, sorry: I do not use the
MED=A0format at all and could not have tested modifications.

________
Feedback

Since all this is unfortunably complicated, the probability that I
missed something despite all my efforts is non-zero. I thank you in
advance for any feedback.

___________________________
The art of Unix Programming

Some food for thought from:
<http://catb.org/~esr/writings/taoup/html/ch01s06.html>

Rule of Transparency: design for visibility to make inspection and
debugging easier.

For a program to demonstrate its own correctness, it needs to be using
input and output formats sufficiently simple so that the proper
relationship between valid input and correct output is easy to check.

Rule of Least Surprise: In interface design, always do the least
surprising thing.