Re: [Refdb-users] Parsing error ? with addref
Status: Beta
Brought to you by:
mhoenicka
From: Daniel O'D. <dan...@ul...> - 2006-06-25 00:52:35
|
Thanks Markus and Rich, The single letter keywords are partially a legacy data issue and partially disciplinary. In manuscript and textual studies some manuscripts are known by 1 or 2 letter sigla: Corpus Christi College Manuscript 41 is known as b1 to most Anglo-Saxonists. Also you could have an article about the Anglo-Saxon word æ. In previous databases, we keyworded things like this because searching all fields was more difficult. My plan in refdb is to rationalise things. The multiple keywords is also a legacy of some script a student ran through once. I'd read the data mangling section but missed the significance of what refdb was doing. I'll turn it off until the data is in better shape, I think! -d On Sun, 2006-25-06 at 00:02 +0200, Markus Hoenicka wrote: > Rich Shepard writes: > > As far as the keyword order is concerned, that's a PostgreSQL thing. Rows > > (tuples) are returned from a query in no particular order. Unless the query > > has an ORDERED BY clause, we'll see this every time. It's not any sort of a > > bug or issue of concern. > > > > This is correct. It applies to all database engines as the SQL > standard does not mandate a particular order of the returned datasets > unless the ORDERED BY clause is used. RefDB does not use this clause > here as the order of keywords is not relevant in RIS. It uses the > clause for author names as their order in the RIS dataset is relevant. > > > the name. What I don't see in your command line is the option to write the > > returned records to a file, e.g., '-o hereiam.ris'. I wonder if you'll see > > the same output in a file that you see on screen, > > > > The output is the same except for the summary which is sent to > stderr. You see it on the screen as it displayes both stuff sent to > stdout (the data) and to sterr (the summary). If you send the output > to a file, it will contain the part sent to stout only. > > > > The problem is the following keyword section > > > > > > KW - l > > >> KW - m > > >> KW - o > > >> KW - ld > > >> KW - h > > >> KW - æ > > >> KW - Cædmon > > > > > These are all unique keywords drawn from other entries in the original > > > RIS file. They show up multiple times in the refdb generated ris files. > > > > This is caused by the automatic keyword scan which is turned on by > default. RefDB scans the titles and abstracts of new entries for > keywords already known to the database. This is a very useful feature > in most cases. However, if you indeed use single-letter keywords, the > purpose of the automatic keyword scan is pretty much defeated as > almost all entries will end up containing these keywords. If having > single-letter keywords is indeed useful and necessary for you, you > should switch off the automatic keyword scan by setting > > keyword_scan f > > in /usr/local/etc/refdb/refdbrc. > > BTW the RefDB handbook contains a section called "Input data mangling" > which explains how RefDB may alter your data. > > regards, > Markus > -- Daniel Paul O'Donnell Associate Professor and Chair of English Director, Digital Medievalist Project <http://www.digitalmedievalist.org/> University of Lethbridge Lethbridge AB T1K 3M4 Canada Vox +1 403 329-2377 Fax +1 403 382-7191 :@caedmon/ubuntu |