Thread: [Refdb-users] Re: The case against <middlename>
Status: Beta
Brought to you by:
mhoenicka
From: Marc H. <mar...@en...> - 2003-12-10 10:23:15
|
On Tue, 9 Dec 2003, Markus wrote: > In American English, where the middle initial is probably used most > widely, the initial can also derive from the mothers maiden name or > any other family name. You can't treat this as part of the first name. Native americans in the office next door to mine treat it like this. They just do not care where this middlename comes from: it is still a "given" name. I think this formal definition: <familyname(s)> : universally defined by law <givenname(s)> : parents (or similar) freely choose them, possibly according to one local tradition or the other. suits your example above. > > This is the apparent drawback. Suppressing an element means providin= g > > less information to subsequent tools. However, I think lack of > > information is better than incomplete/imprecise information. IMHO, > > I beg to differ. No information can't be better than partial > information. If that were true, we should stop doing research and > settle with the fact that we'll never know everything precisely. Let me reformulate: "lack of detail is better than wrong details". No information is lost by storing all "given" names in <firstname> and not parsing them. > At least in life sciences were not free to choose a stylesheet of our > liking. If I want to publish in J.Biol.Chem., I'll have to follow the > citation and bibliography rules of that journal. And if these rules > tell me to format author names like "Last FM" (last name in full, > first name and middle name, if available, as initials), then I must be > able to pull a last, a first and a middle name from the stored data. A style sheet that mandates the use of "middlename" is, to put it mildly, "culture-specific". If it insists on this, then it should be able to extract this information _by itself_, and not spoils the global data model because of this peculiarity. It seems this is exactly how BibTeX's stylesheets work. References given in a previous message seem to show that other formats do it the same way. > This entirely ignores that bibliography styles *require* to rearrange > and reformat the name parts. Sticking with your example, journals > might request: > > D Knuth > D.Knuth > D. Knuth > DE Knuth > D.E. Knuth > Knuth D > Knuth, D > Knuth, D. > Knuth DE > Knuth D.E. > Knuth, DE > Knuth, D.E. > > and maybe another couple of permutations that I forgot. How Mr. Knuth > would like to read his name is unfortunately irrelevant for the > purposes of citing and creating bibliographies. I think this *requirement* is more or less flawed. The more reformatting it requires, the more flawed it is, since the more (wrong) assumptions it will make concerning "name standardization" (i.e., that everybody should have a name that is american-english looking). The worst assumption is of course the requirement of a <middlename>. Assumptions about dots are also flawed, see for instance: <http://www.delorie.com/users/dj/> However, simple transformations like : Donald ->=A0D. seem sensible (I mean: not so flawed), and would allow most of your examples above. In any case, these dirty issues should not spoil the data model, they should be (and can be!) postponed and solved by the stylesheets _themselves_. So mistakes appear only in some printings, and there are no irreversible mistakes in the data source. > > Still want to hold on <middlename>s and make as little changes as > > possible? Then twist the original user input as least as possible, a= nd > > do only perfectly reversible transformations: name parsing/splitting > > > it based _only_ on spaces (I know no language where the size of spac= e > > is meaningful), the output always gives those spaces back, and there > > > is no "clever" parsing using dots, dashes or any other sign (can > > someone affirm that the dot "." is the universal abbreviation sign, = in any > > language?) > Spaces do not help to distinguish between family and other names. Agreed! (even if BibTeX has a complex algorithm to do this, but let's forget it...) I was thinking of a 2-steps parsing: 1) separate given and family names using _comma_, just like it is today 2) then further parse each one using _only spaces_ The rationale is here: if middlenames should be kept in the data model (sigh), have at least only simple, perfectly reversible data transformations in database operations. No dots that magically appear or disappear, no variable number of tokens, etc. It's always time to do this at the formatting step. > > Users are generally not upset by a software that does NOT add a dot > > that they forgot, but they get angry when they do not understand > > at all how and why the software modifies their data, and then they > > write long emails :-) Moreover, complexity brings bugs; simplicity > > brings reliability. > The process is called normalization. If you provide one entry as > "Miller,AM" and the next one as "Miller,A.M.", these will show up as > two different authors in the database. Normalization will result in > "Miller,A.M." in both cases and will map the entries correctly to the > same author. ... and this normalization is too complex to be automated, since no program can correctly handle all particular cases, thus it should be manually carried out by operators. I guess this is already the way it goes in most real cases today? > That is, searching for :AU:=3D"Miller,A.M." will not drop > half of the available entries. But searching for :AU:=3D"Miller,A.*M.*" will give a pretty good result, and reveal to the operator the manual normalization work that must be completed. Cheers, Marc. |
From: Bruce D'A. <bd...@fa...> - 2003-12-10 19:19:31
|
OK, I just pulled out the Bible of citations: the Chicago Manual of Style. Nowhere does it make any distinction between first and middle and last names. Where it discusses names, it uses family and given, even when discussing styles in the life sciences. To quote: "In a reference list, especially in the life sciences, initials rather than full given names are often given...." I guess I'm just not seeing the problem Markus. If you have style that does not require initialization of given names, then it's irrelevant whether you have: <given>James Christopher</given> <family>Scott</family> or ... <given>James C.</given> <family>Scott</family> or... <given>James</given> <middle>Christopher</middle> <family>Scott</family> They would be formatted as complete strings. Likewise, if your style requires initials, then you get the same output: Scott, J. C. If different styles require different spacing between the given name initials, or different punctuation, then it applies to all the given names, including the so-called "middle." Right? Bruce |
From: Bruce D'A. <bd...@fa...> - 2003-12-10 23:36:40
|
On Dec 10, 2003, at 5:31 PM, Markus Hoenicka wrote: > I am not trying to convince you that middle names are a good thing per > se or that the Chicago Manual should adopt them. They are used in the > bibliography styles of journals in the life sciences, and all of these > journals do have pretty strict rules that do or don't comply with the > Chicago Manual. But do they actually explicitly say anything about "middle names"? If=20= I go to the author info page for the J. of Biological Chemistry, I see=20= this: > References > > =B7=A0=A0=A0=A0=A0=A0=A0 cited in text by number rather than author = and date > > =B7=A0=A0=A0=A0=A0=A0=A0 numbered consecutively in the order of = appearance in the=20 > manuscript > > =B7=A0=A0=A0=A0=A0=A0=A0 References for journals and books should be = in the following=20 > styles: > > 1. MacDonald, G. M., Steenhuis, J. J., and Barry, B. A. (1995) J.=20 > Biol. Chem. 270, 8420-8428 > > 2. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular=20 > Cloning: A Laboratory Manual, 2nd Ed., Cold Spring HarborLaboratory,=20= > Cold Spring Harbor, NY All this tells me is that given names -- all of them -- ought to be=20 initialized, with a period, and separated by a space. There is no=20 distinction between first and middle. Bruce |
From: Markus H. <mar...@mh...> - 2003-12-10 22:40:03
|
Marc Herbert writes: > Let me reformulate: "lack of detail is better than wrong details". No > information is lost by storing all "given" names in <firstname> and > not parsing them. > You lose the information that a human brain can put into parsing the name string, using cultural background information that is hard if not impossible to teach to a machine. > A style sheet that mandates the use of "middlename" is, to put it > mildly, "culture-specific". If it insists on this, then it should be > able to extract this information _by itself_, and not spoils the > global data model because of this peculiarity. It seems this is > exactly how BibTeX's stylesheets work. References given in a previous > message seem to show that other formats do it the same way. > Once again, go complain to the publishers of roughly 5000 journals in the life sciences. I also believe that your argument is moot that if a style requires the concept of middle names it should be able to retrieve the middle name by itself. With the same argument you could dump entirely unparsed strings in any order onto a bib software and expect it to figure out how to parse it, as it requires to disginguish between given and family names, titles and suffixes. This simply expresses your dislike of middle names. > I think this *requirement* is more or less flawed. The more > reformatting it requires, the more flawed it is, since the more > (wrong) assumptions it will make concerning "name standardization" > (i.e., that everybody should have a name that is american-english > looking). The worst assumption is of course the requirement of a > <middlename>. Assumptions about dots are also flawed, see for > instance: <http://www.delorie.com/users/dj/> > Once again, I didn't invent these requirements. I have to support them if I want to support the 5000+ journals in the life sciences. > In any case, these dirty issues should not spoil the data model, they > should be (and can be!) postponed and solved by the stylesheets > _themselves_. So mistakes appear only in some printings, and there > are no irreversible mistakes in the data source. > I don't think it is a brilliant idea to have each of 700+ stylesheets (if we consider only the life sciences for a moment) parse and munge the names by themselves. Code duplication and bloating would be inevitable. I'd rather have stupid simple stylesheets that use the preparsed names from the application. > The rationale is here: if middlenames should be kept in the data model > (sigh), have at least only simple, perfectly reversible data > transformations in database operations. No dots that magically > appear or disappear, no variable number of tokens, etc. It's always > time to do this at the formatting step. > That's too late as I pointed out elsewhere. You need the normalization when you enter the data into the database to have a consistent and reliable way to search names. > ... and this normalization is too complex to be automated, since > no program can correctly handle all particular cases, thus it should > be manually carried out by operators. > I guess this is already the way it goes in most real cases today? > So if you want to import 100 references that a nice colleague just sent you, you start adding/removing spaces and dots from somewhere between 100 and 1000 author names? Problematic as it may be in border cases, this is a job that *asks* to be automated. If it fails in too many cases, we have to improve the code. > But searching for :AU:=3D"Miller,A.*M.*" will give a pretty good result, > and reveal to the operator the manual normalization work that must be > completed. > This is what a reference manager should avoid at all costs. Why on earth should a user be forced to use regular expressions just to find references by author names? If this is necessary the data model is flawed. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2003-12-11 00:01:04
|
Bruce D'Arcus writes: > But do they actually explicitly say anything about "middle names"? = If > I go to the author info page for the J. of Biological Chemistry, I s= ee > this: >=20 > > References > > > > =B7 cited in text by number rather than author and date > > > > =B7 numbered consecutively in the order of appearance in th= e > > manuscript > > > > =B7 References for journals and books should be in the foll= owing > > styles: > > > > 1. MacDonald, G. M., Steenhuis, J. J., and Barry, B. A. (1995) J. > > Biol. Chem. 270, 8420-8428 > > > > 2. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular= > > Cloning: A Laboratory Manual, 2nd Ed., Cold Spring HarborLaborator= y, > > Cold Spring Harbor, NY >=20 > All this tells me is that given names -- all of them -- ought to be > initialized, with a period, and separated by a space. There is no > distinction between first and middle. >=20 This is an example where the distinction does not come into play. Other styles require the full first name and initialized middle name(s). In this case it should be apparent that first and middle names= need to be distinguishable. regards, Markus --=20 Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |