refdb-users Mailing List for RefDB (Page 101)
Status: Beta
Brought to you by:
mhoenicka
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(8) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(5) |
Feb
(8) |
Mar
(21) |
Apr
(4) |
May
(20) |
Jun
(18) |
Jul
(5) |
Aug
(4) |
Sep
(11) |
Oct
|
Nov
(5) |
Dec
(16) |
2003 |
Jan
(16) |
Feb
(28) |
Mar
(78) |
Apr
(96) |
May
(40) |
Jun
(52) |
Jul
(55) |
Aug
(119) |
Sep
(40) |
Oct
(30) |
Nov
(46) |
Dec
(50) |
2004 |
Jan
(121) |
Feb
(86) |
Mar
(97) |
Apr
(60) |
May
(75) |
Jun
(67) |
Jul
(110) |
Aug
(75) |
Sep
(92) |
Oct
(120) |
Nov
(27) |
Dec
(23) |
2005 |
Jan
(26) |
Feb
(58) |
Mar
(50) |
Apr
(73) |
May
(165) |
Jun
(11) |
Jul
(10) |
Aug
(17) |
Sep
(32) |
Oct
(25) |
Nov
(35) |
Dec
(21) |
2006 |
Jan
(74) |
Feb
(93) |
Mar
(24) |
Apr
(37) |
May
(45) |
Jun
(125) |
Jul
(101) |
Aug
(39) |
Sep
(10) |
Oct
(32) |
Nov
(36) |
Dec
(20) |
2007 |
Jan
(22) |
Feb
(2) |
Mar
(27) |
Apr
(35) |
May
(6) |
Jun
|
Jul
(19) |
Aug
(8) |
Sep
(3) |
Oct
(26) |
Nov
(15) |
Dec
(3) |
2008 |
Jan
(4) |
Feb
(4) |
Mar
(8) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2009 |
Jan
(5) |
Feb
(39) |
Mar
(7) |
Apr
(24) |
May
(27) |
Jun
(5) |
Jul
(9) |
Aug
(12) |
Sep
(19) |
Oct
(16) |
Nov
|
Dec
(5) |
2010 |
Jan
(5) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(5) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
(6) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(6) |
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
(6) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Bruce D'A. <bd...@fa...> - 2004-01-14 19:36:20
|
On Jan 14, 2004, at 12:20 PM, Marc Herbert wrote: > Conclusion: I would of course never pretend that the majority of > publications let people write their _given_ name(s) as they want. I > just don't know. But a short investigation seems to show that at least > a non-negligeable number of non-negligeable journals does. This is true. I actually "posted" something on this the other day, but accidentally sent only to Markus. His response was > I knew that one out of a million styles would ask for this, so we'll > have to support it. The cleanest way is to record the original string, > if available, in the displayForm element *in addition* to the > parseable data. MODS seems well equipped for these bordercases. I'm not convinced BibTeX is the final authority on these issues, incidentally. The bottom line is that any data model and the formatting engine itself needs to have some concept of a hierarchy of given names, or one must -- by definition -- modify the data or rely on fallible algorithms to get proper output formatting. When I look at a lot of bibtex data, names are not in a form I would consider ideal; stuff like: @BOOK{Raymer, AUTHOR = "Raymer, Daniel P.", TITLE = "Aircraft Design : a Conceptual Approach", SERIES = "AIAA Education Series", PUBLISHER = {American Institute of Aeronautics and Astronautics}, ADDRESS = {Washington, D.C.}, YEAR = 1989} ...or even worse: @BOOK{Peak, AUTHOR = "Peak, D. W.", TITLE = "Developments In The Air Cargo Industry", PUBLISHER = {ICHCA Abford House}, ADDRESS = {London, UK}, YEAR = 1981} As I've said before, I'd be comfortable myself with abbreviating the middle name in my metadata. I do this all the time, in fact. But I do think it's important to realize the bibtex solution relies on this. Bruce |
From: Bruce D'A. <bd...@fa...> - 2004-01-14 19:19:28
|
On Jan 14, 2004, at 12:29 PM, Marc Herbert wrote: >>> It'd be nice if in this discussion we could focus on how to handle=20= >>> this >>> in XML. Proper markup obviates the need for any mangling, after = all. > >> Well put. I'd prefer to spend my time implementing the MODS support >> rather than with fiddling with the RIS format that won't get any >> better than it is. > > Please, am I allowed to spend _my_=A0time? Sure, but it would help if your energy and interest in this matter=20 could contribute to an ideal solution to this thorny problem. Part of=20= the problem with RIS is the tagged format itself. XML solves this=20 problem, if properly used, so any real solution is going to have to lie=20= there IMHO, and in an improved data model. Bruce |
From: <Re...@ao...> - 2004-01-14 18:59:00
|
Hi! I am using MAC OS X version 10.2.8 I have mysql installed. I have installed all of the refdb starter-kit libs (except for sqlite) only to have the refdb make fail with this error message: gcc -g -O2 -o refdbc refdbc.o pref.o strfncs.o readln.o page.o refdb-client.o client-commands.o readris.o connect.o tokenize.o getopt.o linklist.o enigma.o cgi.o atoll.o -lreadline -ltermcap ld: readln.o illegal reference to symbol: _xmalloc defined in indirectly referenced dynamic library /System/ Library/PrivateFrameworks/liberty.framework/Versions/A/liberty make[1]: *** [refdbc] Error 1 make: *** [all-recursive] Error 1 I need to be up and running quickly! Any help is greatly appreciated. Other starter-kit problems with solutions, based on my OS X build: btparse-0.33: During ./configure, an error is reported in btparse-0.33/src/lex_auxiliary.c line 161. The problem is actually on line 162: Orig: || (txt[0] == '"' && txt[len-1] == '"')); Solution: || (txt[0] == '\"' && txt[len-1] == '\"')); make test fails, but cd t; simple_test; suggests all is well? expat-1.95.2 results in me a make error: ld: can't locate file for: -lcrt0.o make[1]: *** [xmlwf] Error 1 make: *** [xmlwf] Error 2 Upgrading to expat-1.95.7 sort of fixes it: make reports mamy error, but completes. So maybe it tries other options to get past the many reported errors. libdbi-drivers-0.7.0: I was getting errors involving incorrect host type. Change: Problem: ./configure --with-mysql results in this error: checking host system type... configure: error: can not guess host type; you must specify one Solution: insert host info into command line, as in: ./configure --host=`echo $HOST` --with-mysql Charlie Rees |
From: Marc H. <mar...@fr...> - 2004-01-14 17:27:16
|
On Sat, 10 Jan 2004, Markus Hoenicka wrote: > Marc Herbert writes: > > We had one problem only Janet knew we faced bankruptcy > > > > Have you decided? Now consider this string again with differing > > punctuation: > > > > We had one problem: only Janet knew we faced bankruptcy. > > We had one problem only: Janet knew we faced bankruptcy. > > We had one problem only, Janet knew: we faced bankruptcy. > > We had one problem only Janet knew we faced: bankruptcy. > > > > This is analogous to the punctuation in names as separators in an > input format. The parser in your brain needs the punctuation in order > to understand the intended meaning of these sentences. The parser in a > reference manager needs the periods/spaces in order to understand the > name parts (that is, if you use an odd input format like RIS that does > not have better means to separate the parts, like XML). That's why they > need to be consistent and independent of the personal taste of the > person who carries that name. Yeah, and that's why periods suck as a separator. Because: - for RIS, the period is a separator; - for some publishers, it's a decoration ("formatting") - for authors, it's a *consistent* way to inform about an abbreviation, independent of their personal taste (well... not for Truman, granted) This is too much asking from the period sign. Clashes are impossible to avoid. 1st and 2nd dashes co-exist peacefully in refdb. My patch drops them in favor of dash 3, because I don't care about the _given_ name parsing in refdb. That's all. Don't take offense because I dropped a part of your code. I still enjoy all the rest very much. > As your example above shows, you can't > move punctuation around in English sentences just because you'd > personally like to have it somewhere else. The same applies to names. Agreed: a period after a capital means an abbreviation, so a lack of period means... no abbreviation! You can't just pop it up or down for RIS- or publishers reasons. This is my point of view. Of course, the (your) point of view of a RIS parser is totally different and incompatible with mine. So what? I am not trying to convince you that "I am right", just trying to make you discover and understand a different point of view. Please stop trying to prove that my point of view is non-sense (or do it for good). It's just a slighty different use of your software. May I? Cheers, Marc. |
From: Marc H. <mar...@fr...> - 2004-01-14 17:24:32
|
On Sat, 10 Jan 2004, Markus Hoenicka wrote: > Marc Herbert writes: > > In "Harry S Truman", the S is not an abbreviation. There has been a > > debate whether it should nevertheless be written "S." "for the sake > > of consistency", at the price of some (admittedly harmless) > > information loss. See google. > Citing from the Truman Presidential Museum and Library > (http://www.trumanlibrary.org/speriod.htm): > > "In recent years the question of whether to use a period after the "S" > in Harry S. Truman's name has become a subject of controversy, > especially among editors. The evidence provided by Mr. Truman's own > practice argues strongly for the use of the period. While, as many > people do, Mr. Truman often ran the letters in his signature together > in a single stroke, the archives of the Harry S. Truman Library has > numerous examples of the signature written at various times throughout > Mr. Truman's lifetime where his use of a period after the "S" is very > obvious." > This doesn't mean there can't be other examples of non-abbreviated > single-letter middlenames, but Truman apparently is not one of them. I suggest you go on reading the same page, just a couple of lines further: "In explanation he said that the "S" did not stand for any name" And a bit later: "According to The Chicago Manual of Style all initials given with a name should "for convenience and consistency" be followed by a period even if they are not abbreviations of names." This last sentence says a bunch of interesting things: - stylesheets that "normalize" on "no-periods" are neither "convenient" nor "consistent" according to this manual (Uh?) - the last line shows that the Chicago Manual knows enough cases of single-letter in names, besides Truman's, to have considered this issue. That answers one of your questions. - the need for this justification for the "all-periods" policy shows that it was not obvious and that there has been a debate; please tell about what, if not loss of information? And finally, it's a recommandation made by a manual of *style*, and not a recommandation about the design of bibliographic databases. > > > > > An initial is a capital letter by definition. > > > > But the reverse is wrong. A capital letter is not an initial by > > definition. It just may be. A capital letter + a period is an initial > > by definition. > > See: <http://www.cogs.susx.ac.uk/local/doc/punctuation/node28.html> > I disagree. It's an initial followed by an indicator that the previous > letter is an abbreviation of something else. The difference should be > apparent if we think of an initial as data and what an output format > is supposed to do with it. Format (1) outputs initials as they are, > format (2) renders them using dots: > (1) DJ Last > (2) D.J.Last ... which clearly shows that format (1) lost the information that 'D' and 'J' are abbreviations. You have to guess it. Easy for 99% of names. And the remaing 1% does not matter: these people should probably better get a life and normalize their names anyway... By the way, I am wondering what "uppercase" means in unicode. > > See: DJ Delorie > > This name is handled gracefully by RefDB. It is not mangled in any > way. Sure! But the topic here was "is a period information?" (see the Subject:). This example just demonstrates the issue with stylesheets that think period are just formatting and suppress them, losing information. They can not make the difference between: DJ Delorie -> DJ Delorie (OK) and Dorothy J. Delorie -> DJ Delorie (loss of periods/information) I obviously never asked you to correct these stylesheets! In fact, I even gave up asking you to change _anything_ in refdb about this since quite a time (excepted some added words in the documentation). I just recently sent on the list a patch not to lose the information in advance in the database, with admitted and documented short-comings. Then you felt complied to demonstrate this patch is the apocalypse, giving to it probably more attention and publicity that it deserves. > > > And again, RefDB will not support names that can't be expressed in > > > RIS syntax until a MODS-based data format is implemented. > > > > Well... my patched version tries to support them :-> It is its main > > purpose. > No, it does not support them. The patch prevents that RefDB > understands the names. Instead you dumb down the application to a > state that it returns the same string that you sent in. However, in > order to do anything useful with the names, RefDB must be able to > parse them. The patch effectively prevents creating formatted > bibliographies and export to all data formats that distinguish name > parts. At least half of this is plain wrong. My patch only prevents RefDB to parse the GIVEN NAME, and only FROM RIS INPUT. I suggest you read this page: <http://marc.herbert.free.fr/refdb/reversible/> that describes the patch with decent accuracy, contrary to your paragraph above. Reading it may be useful if you want to make comments (but you don't have to make comments). Cheers, Marc |
From: Marc H. <mar...@fr...> - 2004-01-14 17:20:36
|
On Sat, 10 Jan 2004, Markus Hoenicka wrote: > Could you provide an example of a publisher in the natural sciences or > anywhere else whose author name formatting recommendations read: "Use > whatever the bearer of that name prints on his letterhead?" Besides > the difficulty to even obtain this information for all 100+ author > names that an average bibliography carries, I'm not aware of any > publisher allowing this. The result would be bibliography entries > like: > > F.D. Roosevelt, Truman, Harry S., Chun Wu, Dwight D Eisenhower, > Schmidt HHHW: A paper about something. Science 56:456, 2000. > > Do you think this is acceptable to anyone? Do you think this is > readable? Is Chun the given name or the family name? This discussion is and has always been about given names and so-called "middlenames" only. That was even stated in one "Subject:". I find somewhat dishonest to suddenly pretend that I want to dump the whole difference between family name(s) and given name(s). Of course this would be ridiculous. I slighty reformulate your question as if it answered my messages: > Could you provide an example of a publisher in the natural sciences > or anywhere else whose author name formatting recommendations read: > "Use whatever the bearer of that GIVEN name prints on his > letterhead?" ^^^^^ I made some quick statistics about BibTeX stylesheets and related investigation to try to answer this question. BibTeX is the de-facto format & tool to manage bibliographies with LaTeX. LaTeX is this small typesetting system used by millions of people. BibTeX does not know what is a "middlename" (just like most formats). It knows only 4 parts: - "von" - last name(s) - first name(s) - suffix (e.g. "Jr") All BibTeX stylesheets I have seen either do format the _given_ name(s): - "as is" from the BibTeX file (their "database") {ff} - abbreviate it and period-ize it {f.} The BibTeX code for printing the given name(s) "as is" is {ff}, while the code for abbreviating the given name(s) is {f.} or similar. See: <http://www.eeng.dcu.ie/local-docs/btxdocs/btxhak/btxhak/node5.html> Basically, the {ff} code means that the stylesheet does not want to format the given name(s), maybe because it thinks this is too error-prone. It just trusts the typist. The question is: did I made up category {ff} ? The standard unix LaTeX installation I have (TeTeX) is shipped with 10 stylesheets (excluding variations). Those are the very basic bibliographic stylesheets used by all people that do not care to design their own. Half of them are {ff}: 5 stylesheets (plain, alpha, unsrt, amsplain and amsalpha) format the given name just as given (i.e., they don't format it), while 5 remaining. "ieeetr, abbrv, siam, apalike, acm" abbreviate it. Of course, it's possible that all {ff} stylesheets are minor ones, while all the ones used by professional publications are not. So I looked for some precise examples of {ff} BibTeX stylesheets in the LaTeX archive (CTAN) and also in this compilation: <http://www.lecb.ncifcrf.gov/~toms/latex.html> I found at least the following {ff} stylesheets for "real" publications: - American Mathematical Society - American Journal of Human Genetics - Methods in Enzymology - Journal of Neuroscience Besides BibTeX stylesheets, I also found some other real world examples of lack of given name(s) formatting: - All Elsevier's International Federation of Automatic Control (IFAC) journals (stylesheet ifac.bst) <http://authors.elsevier.com/getting_published.html?dc=QG3> - The german DIN 1505 standard seems to let people free to decide how their given name(s) should be written. <http://www.phil-fak.uni-duesseldorf.de/ie/competence/09_schriftkom/bibdin.html#top> - The MLA (Modern Language Association) style seems quite popular and does neither "format" the given name(s). See BibTeX file "mla.bst" and <http://www.english.uiuc.edu/cws/wworkshop/MLA/singleauthor.htm> "The author's name should be given as it is listed on the title page of the text." > > - some less authoritarian publishers/formatting conventions leave more > > freedom about this, in order to please authors and grant them the > > right to write their (possibly "weird") name as they want. > I've never seen this in real life, and I'm glad I didn't. Again, I was of course talking about GIVEN names only, it did not change from the start of the discussion. Conclusion: I would of course never pretend that the majority of publications let people write their _given_ name(s) as they want. I just don't know. But a short investigation seems to show that at least a non-negligeable number of non-negligeable journals does. |
From: Marc H. <Mar...@fr...> - 2004-01-14 15:47:57
|
Of course this problem has been tackled before. But even better, some bibliographic database manager wrote a very interesting article about it: "The Identification of Authors in the Mathematical Reviews Database" <http://www.library.ucsb.edu/istl/01-summer/databases.html> "There was a time when Mathematical Reviews even attempted to "correct" the published form of a name, perhaps believing that some editors and publishers just didn't try hard enough. As a survivor of those days, an internal Mathematical Reviews concept is that of the "preferred name,"... On Fri, 9 Jan 2004, Marc Herbert wrote: > The database, being unable to tell which is the "right" spelling, or > worst, not even in some cases being able to tell if all these writings > designate the same person, should carefully preserve every character > from every typist. So the database has no choice but storing the input > "as is". Preferably pre-parsed, but without any character lost or > added. All these inputs become (unfortunately, but what can you do?) > different authors. > > Meanwhile, a "clever" algorithm that is aware of most common > typing-names mistakes in our culture computes a "normalized" (or > "reduced", or "projected") representation of the given name for each > record. > Such a simple algorithm could be for instance: > - throw away a set of characters (period, hyphen, apostrophe, > space,...) > - lowercase all characters > - dump all diacritics > - ... > > This "projected name" is stored in the author record, besides the > typist input. It is _indexed_ and used to perform queries. It can be > used to detect false duplicates easily and efficiently, including at > input time! (i.e. "Don't you think you should rather write this name > this way?") |
From: Bruce D'A. <bd...@fa...> - 2004-01-11 16:38:10
|
Two useful links on names: http://dublincore.org/documents/name-representation/ http://rdfweb.org/topic/NamesInFoaf Bruce |
From: Markus H. <mar...@mh...> - 2004-01-10 01:28:46
|
Bruce D'Arcus writes: > Oh, and another thing: > > It'd be nice if in this discussion we could focus on how to handle this > in XML. Proper markup obviates the need for any mangling, after all. > Well put. I'd prefer to spend my time implementing the MODS support rather than with fiddling with the RIS format that won't get any better than it is. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2004-01-10 01:28:36
|
Olivier Fisette writes: > I think it would be important for RefDB to be able to format citations and > create bibliographies with sxw files, the format used by OpenOffice.org > Writer, which is currently the most popular open source word processor. Since > the KOffice software suite is currently in the process of adopting > OpenOffice.org file formats, RefDB support for sxw would eventually benefit a > great number of users. > > sxw files use XML, and the file format is extensively documented. > (http://xml.openoffice.org/) Character and paragraph styles could be used > respectively as markers for citation formatting and bibliography insertion. > > OpenOffice.org already has a system for formatting and inserting references, > but it is not sophisticated enough for scientists who want to manage > off-prints collections, browse references, and use the same system for > formatting citations and creating bibliographies using different file > formats. Also, an independant implementation could be used both in > OpenOffice.org Writer, KWord, and other programs through the use of macros. > > What are your opinions on the subject ? > Wish I could support this! But I'm afraid I won't be able to do this myself. If anyone wants to look into this, please do so. You'll get all the support that I can offer. I've been in contact with one of the OO developers a while ago, and we tried to figure out whether it is possible to use RefDB as an external data source for the built-in bib support. This wouldn't have solved the formatting issue, but might have been a start. Unfortunately nothing materialized. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2004-01-10 01:28:31
|
Marc Herbert writes: > It is irrelevant to authoritarian stylesheets. > Could you provide an example of a publisher in the natural sciences or anywhere else whose author name formatting recommendations read: "Use whatever the bearer of that name prints on his letterhead?" Besides the difficulty to even obtain this information for all 100+ author names that an average bibliography carries, I'm not aware of any publisher allowing this. The result would be bibliography entries like: F.D. Roosevelt, Truman, Harry S., Chun Wu, Dwight D Eisenhower, Schmidt HHHW: A paper about something. Science 56:456, 2000. Do you think this is acceptable to anyone? Do you think this is readable? Is Chun the given name or the family name? > - there is a strong need for a "normalized" representation of names, > to avoid false duplicates and enhance results of queries. Agreed. > - some formatting tools/stylesheets "normalize" your > names, deciding if and where you should put periods, dashes, > initializing or not, etc. This is what I experience with all papers and books that I read at work. > - some less authoritarian publishers/formatting conventions leave more > freedom about this, in order to please authors and grant them the > right to write their (possibly "weird") name as they want. > I've never seen this in real life, and I'm glad I didn't. > > I think it's technically possible to please everyone, by isolating > issues. Let's take the example of this problematic name: > (<http://citeseer.nj.nec.com/context/153368/0>) > > Chu, H.K. Jerry (that's the precise way he writes it himself) > In a useful input format, this would turn into something like: <name> <familyname>Chu</familyname> <givenname type="abbrev">H</givenname> <givenname type="abbrev">K</givenname> <primegivenname>Jerry</primegivenname> </name> This is what the database needs to know in order to do something useful with the name. It does not care whether that guy prefers either of these: Chu, H. K. Jerry Chu,H.K.Jerry H.K.Jerry Chu or whatever. I'm getting tired of this, but this is about formatting. The XML example above is about input data. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2004-01-10 01:28:28
|
Marc Herbert writes: > In "Harry S Truman", the S is not an abbreviation. There has been a > debate whether it should nevertheless be written "S." "for the sake > of consistency", at the price of some (admittedly harmless) > information loss. See google. >=20 Citing from the Truman Presidential Museum and Library (http://www.trumanlibrary.org/speriod.htm): "In recent years the question of whether to use a period after the "S" in Harry S. Truman's name has become a subject of controversy, especially among editors. The evidence provided by Mr. Truman's own practice argues strongly for the use of the period. While, as many people do, Mr. Truman often ran the letters in his signature together in a single stroke, the archives of the Harry S. Truman Library has numerous examples of the signature written at various times throughout Mr. Truman's lifetime where his use of a period after the "S" is very obvious." This doesn't mean there can't be other examples of non-abbreviated single-letter middlenames, but Truman apparently is not one of them. >=20 > > An initial is a capital letter by definition. >=20 > But the reverse is wrong. A capital letter is not an initial by > definition. It just may be. A capital letter + a period is an initi= al > by definition. > See: <http://www.cogs.susx.ac.uk/local/doc/punctuation/node28.html> >=20 I disagree. It's an initial followed by an indicator that the previous letter is an abbreviation of something else. The difference should be apparent if we think of an initial as data and what an output format is supposed to do with it. Format 1 outputs initials as they are, format 2 renders them using dots: FM Last F.M.Last The dots (or the lack of dots) are the formatting, the capital letter is the data. > See: DJ=A0Delorie This name is handled gracefully by RefDB. It is not mangled in any way. > This period says: "this letter before stands for an abbreviation". > It's formatting, carrying an information. Some stylesheets may not > care about this information, prefering esthetics, while some others > stylesheets may care. But a database or a format should better stay > _neutral_ and postpone the decision, so to please _everyone_, not ju= st > one side. This is not the point. The RIS format cannot make the distinction. An XML format specifically designed for this purpose will be able to. > The reasons why I do not want to use the RIS middlenames period-base= d > syntax are quite obvious above. Periods carry some information, and > middlenames are culture-specific. >=20 Fine, so let's wait until the MODS-based data model materializes. >=20 > > I'm surprised that this seems new to you. >=20 > Well, I must admit that I found the period-based RIS syntax a bit > weird when discovering it at first in refdb's manual, but > I=A0unfortunately overlooked the potential implications at this time= . > Especially since I did not see it later anywhere else. By the way, d= o > you have pointers to some other "official" RIS=A0specification? Are > others' definition strictly identical? >=20 To the best of my knowledge there is no other official spec except the help files, the PDF manual, and the example databases that they ship with the program. In addition it is helpful to see what the individual styles do to the data upon output. So it's rather reverse engineering than a useful spec. > BTW, how do you avoid false duplicates in this case? By asking every= > bibliographer to use the real, original cyrillic spelling? No, by asking them to settle on one transliteration. Using MODS, however, the cyrillic spelling is an option too as it has some means to carry the transliteration. > >=A0And again, RefDB will not support names that can't be expressed = in > > RIS syntax until a MODS-based data format is implemented. >=20 > Well... my patched version tries to support them :-> It is its main > purpose. >=20 No, it does not support them. The patch prevents that RefDB understands the names. Instead you dumb down the application to a state that it returns the same string that you sent in. However, in order to do anything useful with the names, RefDB must be able to parse them. The patch effectively prevents creating formatted bibliographies and export to all data formats that distinguish name parts. You basically try to send a program written in Perl through a C compiler. You notice that the C parser can't handle the Perl code, so you decide to disable the parser and hope that the compiler will be able to figure out the grammar all by itself. This is not going to work. The only fix is to rewrite the program in C syntax. > Wow... this is becoming harder and harder to understand (I mean: > authorinfo.c r1.4, lines 70s) Could you document this new "hyphenate= d > double initials" format carefully please? Is the hyphen mandatory? I= s > "H.K." now legal input? Or just "H-K" is? Is this RISX output legal= > RISX input? etc. There is not much to document. A firstname and a middlename are abbreviated without a hyphen because there is none in the first place: Franklin Delano -> F.D. (RIS) <firstname>F</firstname><middlename>D</middlename> (RISX) A hyphenated double name retains the hyphen in the initialized form because there is a hyphen in the first place: Karl-Heinz -> K.-H. (RIS) <firstname>K-H</firstname> (RISX) That's all. regards, Markus --=20 Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2004-01-10 01:28:25
|
Marc Herbert writes: > We had one problem only Janet knew we faced bankruptcy > > Have you decided? Now consider this string again with differing > punctuation: > > We had one problem: only Janet knew we faced bankruptcy. > We had one problem only: Janet knew we faced bankruptcy. > We had one problem only, Janet knew: we faced bankruptcy. > We had one problem only Janet knew we faced: bankruptcy. > This is analogous to the punctuation in names as separators in an input format. The parser in your brain needs the punctuation in order to understand the intended meaning of these sentences. The parser in a reference manager needs the periods/spaces in order to understand the name parts (that is, if you use an odd input format like RIS that does not have better means to separate the parts, like XML). That's why they need to be consistent and independent of the personal taste of the person who carries that name. As your example above shows, you can't move punctuation around in English sentences just because you'd personally like to have it somewhere else. The same applies to names. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bd...@fa...> - 2004-01-09 17:47:57
|
On Jan 9, 2004, at 12:37 PM, Olivier Fisette wrote: > I think it would be important for RefDB to be able to format citations > and > create bibliographies with sxw files, the format used by OpenOffice.org > Writer, which is currently the most popular open source word > processor. Since > the KOffice software suite is currently in the process of adopting > OpenOffice.org file formats, RefDB support for sxw would eventually > benefit a > great number of users. Good idea! Also good news about KWord, which I've been testing a bit on OS X. Previously I advocated using refdb as a basis for the bibliographic module project at OOo. http://bibliographic.openoffice.org I doubt that'll happen, but certainly at some point it ought to be possible to use refdb with OOo. I'm not convinced the citation support in the file format is currently well-suited to processing by refdb though. Hopefully that'll change... Bruce |
From: Bruce D'A. <bd...@fa...> - 2004-01-09 17:42:50
|
Oh, and another thing: It'd be nice if in this discussion we could focus on how to handle this =20= in XML. Proper markup obviates the need for any mangling, after all. One person on the MODS list suggested this as a way to deal with a name =20= like "S. Michael Smith": <namePart type=3D"given">Steven</namePart> <namePart type=3D"primegiven">Michael</namePart> <namePart type=3D"family">Smith</namePart> I suggested as an alternative: <namePart type=3D"given" level=3D"2">Steven</namePart> <namePart type=3D"given" level=3D"1">Michael</namePart> <namePart type=3D"family">Smith</namePart> There are others still, e.g.: <namePart>Steven</namePart> <namePart type=3D"given">Michael</namePart> <namePart type=3D"family">Smith</namePart> While not ideal, it does introduce a distinction between the two names =20= that could be subject to processing. Here's a similar example I posted on my blog using vcard in an RDF =20 representation: =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<bqs:Person=A0rdf:parseType=3D"Resourc= e"> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<vCard:N=A0rdf:parseType=3D"Reso= urce"> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<vCard:Family>Snyders</vCa= rd:Family> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<vCard:Given>D</vCard:Give= n> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<vCard:Other>J</vCard:Othe= r> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0</vCard:N> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<vCard:ORG=A0rdf:parseType=3D"Re= source"> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<vCard:Orgname> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0Vanderbilt=A0Univers= ity=A0School=A0of=A0Medicine =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0</vCard:Orgname> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0<vCard:Orgunit>Department=A0= of=A0Pharmacology</vCard:=20 Orgunit> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0</vCard:ORG> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0</bqs:Person> Bruce= |
From: Olivier F. <oli...@rs...> - 2004-01-09 17:37:16
|
I think it would be important for RefDB to be able to format citations and= =20 create bibliographies with sxw files, the format used by OpenOffice.org=20 Writer, which is currently the most popular open source word processor. Sin= ce=20 the KOffice software suite is currently in the process of adopting=20 OpenOffice.org file formats, RefDB support for sxw would eventually benefit= a=20 great number of users. sxw files use XML, and the file format is extensively documented.=20 (http://xml.openoffice.org/) Character and paragraph styles could be used=20 respectively as markers for citation formatting and bibliography insertion.= =20 OpenOffice.org already has a system for formatting and inserting references= ,=20 but it is not sophisticated enough for scientists who want to manage=20 off-prints collections, browse references, and use the same system for=20 formatting citations and creating bibliographies using different file=20 formats. Also, an independant implementation could be used both in=20 OpenOffice.org Writer, KWord, and other programs through the use of macros. What are your opinions on the subject ?=20 =2D-=20 Olivier Fisette, B. Sc. Protein Biosynthesis Research Laboratory Department of Biochemistry and Microbiology Laval University Office 3208, Charles-Eug=E8ne-Marchand building Qu=E9bec (Qu=E9bec), Canada, G1K7P4 Telephone : +1 (418) 656-2131, ext. 6273 Electronic mail : oli...@rs... |
From: Rich S. <rsh...@ap...> - 2004-01-09 17:28:43
|
On Fri, 9 Jan 2004, Marc Herbert wrote: > See: DJ=A0Delorie > <http://www.delorie.com/users/dj/> > Please note that my legal first name really is "DJ". It is not > correct to insert a space between the D and the J, or to put periods > after them as if they were initials, or to make either of them lower > case. They are not initials, and I have no middle name. Honest. > > Many computers will automatically replace DJ with something else, > thinking that the operator typed it in wrong. They should be > considered broken, and repaired. I usually encounter these in credit > card companies and utilities. Poor guy. Either his parents didn't want a son or they have (had?) a strange sense of humor. Life's tough enough for a kid without adding to his burden. Rich --=20 Dr. Richard B. Shepard, President Applied Ecosystem Services, Inc. (TM) <http://www.appl-ecosys.com> |
From: Bruce D'A. <bd...@fa...> - 2004-01-09 17:09:25
|
On Jan 9, 2004, at 11:59 AM, Marc Herbert wrote: > So punctuation seems quite far from formatting... if not formatting > then punctuation is data? What is the purpose of punctuation. Punctuation with respect to bibliographic formatting is different that otherwise though, because it is often strictly defined in the style itself. Bibliographic data is more regular than written text... Bruce |
From: Marc H. <mar...@fr...> - 2004-01-09 17:04:56
|
On Wed, 7 Jan 2004, Marc Herbert wrote: > > The database contains the name parts, plus a normalized > > representation for speeding up queries that happens to look like some > > formatted representation. When creating a bibliography, RefDB then has > > to assemble the name parts in a fashion that matches the requirements > > of the publisher. > > > > It is irrelevant how the cited author or the author > > writing the paper would like to represent that name. It is irrelevant to authoritarian stylesheets. > if some information is lost in this great process, whatever its > noble purpose is, some author may _never_ see his name printed as he > would like to, even when some stylesheets allow it. Please understand the arguments below as a very general discussion about "what should be stored in a bibliographic database" and NOT anymore as a discussion about "what should refdb do" or "what do you think about the RIS format". Or else please start a new thread. Thanks in advance. Let me try to sum up the issues. - there is a strong need for a "normalized" representation of names, to avoid false duplicates and enhance results of queries. - some formatting tools/stylesheets "normalize" your names, deciding if and where you should put periods, dashes, initializing or not, etc. - some less authoritarian publishers/formatting conventions leave more freedom about this, in order to please authors and grant them the right to write their (possibly "weird") name as they want. I think it's technically possible to please everyone, by isolating issues. Let's take the example of this problematic name: (<http://citeseer.nj.nec.com/context/153368/0>) Chu, H.K. Jerry (that's the precise way he writes it himself) Depending on the typist (errare humanum est), the given name becomes: - HK Jerry - H.-K. Jerry - Hsiao Keng Jerry - Hsiaokeng Jerry - etc. [of course, he could become much more severely mistyped, and then the reasoning below will be less efficient/interesting. But anyway nothing will worked for severe cases except firing the typist]. The database, being unable to tell which is the "right" spelling, or worst, not even in some cases being able to tell if all these writings designate the same person, should carefully preserve every character from every typist. So the database has no choice but storing the input "as is". Preferably pre-parsed, but without any character lost or added. All these inputs become (unfortunately, but what can you do?) different authors. Meanwhile, a "clever" algorithm that is aware of most common typing-names mistakes in our culture computes a "normalized" (or "reduced", or "projected") representation of the given name for each record. So only two differents ones here: - hkjerry - hsiaokengjerry Such a simple algorithm could be for instance: - throw away a set of characters (period, hyphen, apostrophe, space,...) - lowercase all characters - dump all diacritics - ... This "projected name" is stored in the author record, besides the typist input. It is _indexed_ and used to perform queries. It can be used to detect false duplicates easily and efficiently, including at input time! (i.e. "Don't you think you should rather write this name this way?") The sample algorithm above is just... an example. Obviously, the "cleverness" of the algorithm deserves more discussion (and another thread). This algorithm could be easily configurable, for instance depending on cultural specifities. Even better, there could be several projections used by the database, covering different scenarios. For instance, an concurrent "abbreviating" algorithm that takes only capitals could run and give: - HKJ thus collapsing many more different inputs, and offering the client an very efficient "search using initials" additional feature. Stylesheets can pick up all the information they need (preferably pre-parsed), and are free to normalize names as they want to at publishing time. Comments? |
From: Marc H. <Mar...@fr...> - 2004-01-09 16:59:06
|
On Wed, 7 Jan 2004, Markus Hoenicka wrote: > The dot is no information. It is formatting. Please separate data from > formatting. Generally speaking, I don't think there is a sharp line between "formatting" and "information". Most of the time, formatting carries information, it's a way to represent information. Of course, margin sizes is quite far from that. But for instance, at the beginning of most computer books you'll find something like this: - text using this font <courier> *means* that it is... - etc. Would you say that punctuation for instance is formatting? Most people don't call punctuation "formatting", because punctuation is much more the responsibility of the author than of the publisher, punctuation is much more about _meaning_ than esthetics, see for instance: "Why Learn to Punctuate?" <http://www.cogs.susx.ac.uk/local/doc/punctuation/node02.html> (emphasized words by me) If your reader has to wade through your strange punctuation, she will have trouble following your *meaning*; at worst, she may be genuinely unable to understand what you've written. If you think I'm exaggerating, consider the following string of words, and try to decide what it's supposed to *mean*: We had one problem only Janet knew we faced bankruptcy Have you decided? Now consider this string again with differing punctuation: We had one problem: only Janet knew we faced bankruptcy. We had one problem only: Janet knew we faced bankruptcy. We had one problem only, Janet knew: we faced bankruptcy. We had one problem only Janet knew we faced: bankruptcy. Are you satisfied that all four of these have completely different *meanings*? So punctuation seems quite far from formatting... if not formatting then punctuation is data? What is the purpose of punctuation? To provide structure to sentences and paragraphs. Now what is the purpose of "formatting" headings and indentation? The same: to represent structure information, just at a higher level. So we have on one side of the line of the "classification": data, and on the other side: formatting, but both with the exact same purpose! I find this weird. The only difference I=A0see, is that punctuation is standardized since ages, while there are many different ways to format headings. So maybe we have a classification criterion here: "standardized" information is data, while "formatting" is choice? Then let's get back for a second to the use of a period as a mean to inform about an abbreviation: is it "standardized" or "free"? Some stylesheets want to enforce a standard about this (in one way or the other). Others do not dare to touch to this; they leave the decision to the author. Should a database be on the "enforcers" side, or stay neutral? In a good work relationship between a publisher and an author, there is no clear demarcation line between the work of each other. Oh sure, the author should never tell the publisher about the sizes of margins, nor the publisher should ever tell the author about his formulas, but there are a whole lot of things less clearly separated than that. A publisher may often correct orthography and grammar (which is also structure, btw). Is this a "formatting" job? Of course, the issue at stake here is "just" about names, and not about formatting in general, so let's forget most of the above. Nevertheless, I wanted to underline that an opposition between data and formatting is not an "evidence" generally speaking. For those who want to know a bit more about "why the human brain can not cope with unstructured information", I suggest this very famous article <http://www.well.com/user/smalin/miller.html#recoding> Cheers, Marc. |
From: Marc H. <Mar...@fr...> - 2004-01-07 22:41:45
|
On Mon, 5 Jan 2004, Marc Herbert wrote: > It's done. See: <http://marc.herbert.free.fr/refdb/reversible/> or > below/attached. > BTW, while testing and comparing, I found some quirks that do not seem > to fit _any_ logic (as opposed to: not fit my taste). One more point: the "quirks" above were all detailed in the rest of this previous message. In order to improve my english, I would be happy to know how you understood it before :-) "Reply-To: me" since this is progressively going off-topic. Cheers, Marc. |
From: Markus H. <mar...@mh...> - 2004-01-07 20:24:56
|
Bruce D'Arcus writes: > > On Jan 7, 2004, at 2:47 PM, Markus Hoenicka wrote: > > > I'm afraid not even this would help. It's not that the publishers > > wouldn't care about the names of the cited authors, but to them a > > consistent bibliography formatting is more important than individual > > wishes. > > But the issue at hand is separating out metadata concerns (proper > coding of data) from formatting concerns (proper formatting of data). > They are not the same, and having well-parsed and accurate > authoritative data would certainly help matters. > Ah, I see. Good point. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bd...@fa...> - 2004-01-07 20:00:34
|
On Jan 7, 2004, at 2:47 PM, Markus Hoenicka wrote: > I'm afraid not even this would help. It's not that the publishers > wouldn't care about the names of the cited authors, but to them a > consistent bibliography formatting is more important than individual > wishes. But the issue at hand is separating out metadata concerns (proper coding of data) from formatting concerns (proper formatting of data). They are not the same, and having well-parsed and accurate authoritative data would certainly help matters. Bruce |
From: Markus H. <mar...@mh...> - 2004-01-07 19:49:25
|
Bruce D'Arcus writes: > In an ideal world, there's be a central repository with definitive > listings, available as a web service. Until that day, though (maybe > never), this will always be a difficult issue. > I'm afraid not even this would help. It's not that the publishers wouldn't care about the names of the cited authors, but to them a consistent bibliography formatting is more important than individual wishes. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Bruce D'A. <bd...@fa...> - 2004-01-07 18:00:55
|
On Jan 7, 2004, at 9:26 AM, Marc Herbert wrote: > You really do not understand that, if some information is lost in this > great process, whatever its noble purpose is, some author may _never_ > see his name printed as he would like to, even when some stylesheets > allow it. The issues here are rather larger and more complicated than they appear on the face of it, and I think it's best to recognize this on both sides of the issue. One way to highlight this is to ask this simple question: How do you -- the person entering name data -- know exactly how the author intends their name to be represented? It's rarely adequate to look at a heading in an article, or in a reference list, for example. In an ideal world, there's be a central repository with definitive listings, available as a web service. Until that day, though (maybe never), this will always be a difficult issue. Bruce |