refdb-devel Mailing List for RefDB (Page 16)
Status: Beta
Brought to you by:
mhoenicka
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(14) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
|
Feb
|
Mar
|
Apr
(8) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2003 |
Jan
|
Feb
(1) |
Mar
(5) |
Apr
(6) |
May
(6) |
Jun
(4) |
Jul
(11) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(174) |
2004 |
Jan
(10) |
Feb
(2) |
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
(2) |
Feb
(6) |
Mar
(11) |
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(25) |
Oct
(18) |
Nov
(16) |
Dec
(19) |
2006 |
Jan
(6) |
Feb
|
Mar
|
Apr
(21) |
May
(9) |
Jun
(5) |
Jul
(51) |
Aug
(89) |
Sep
(42) |
Oct
(19) |
Nov
(47) |
Dec
(4) |
2007 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(4) |
Aug
(4) |
Sep
(5) |
Oct
|
Nov
(7) |
Dec
(4) |
2008 |
Jan
|
Feb
|
Mar
|
Apr
(14) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2009 |
Jan
|
Feb
(21) |
Mar
(8) |
Apr
(5) |
May
(6) |
Jun
(2) |
Jul
(5) |
Aug
|
Sep
(3) |
Oct
(14) |
Nov
|
Dec
|
2010 |
Jan
(18) |
Feb
(5) |
Mar
|
Apr
|
May
(4) |
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(9) |
Nov
|
Dec
|
From: Markus H. <mar...@mh...> - 2006-07-12 21:07:38
|
Hi David, David Nebauer writes: > From the 'ucs' package documentation: > > The simplest use of this package is to add > \usepackage{ucs} > \usepackage[utf8x]{inputenc} > to your header. You may even omit the first line in many cases. > > > It worked for me "out of the box". I installed the 'ucs' package > (apt-get install latex-ucs), added those two lines to the preamble, ran > 'latex test' and, presto, gloriously rendered unicode. > There seems to be something else for Unicode support. Ever heard of this? Is that supposed to work without installing additional files? (cited from: http://mail.nl.linux.org/linux-utf8/2004-04/msg00000.html) ------ In mid February, the LaTeX project team released a new version that now supports UTF-8. For details, see ftp://ftp.tex.ac.uk/tex-archive/macros/latex/base/utf8ienc.dtx You now can finally simply replace \usepackage[latin1]{inputenc} with \usepackage[utf8]{inputenc} and can this way move all your non-ASCII LaTeX input to UTF-8 as well. -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: David N. <dav...@sw...> - 2006-07-12 14:22:59
|
Hi Markus, > It is. Unfortuntately the konwert sources are a bit hard on the eyes be= cause > they're in Polish but I'll try to steal from that anyway. A lot of the heavy lifting is done by the (executable) filters. On my=20 system they live in '/usr/share/konwert/filters/'. UTF8-ascii is a bash script: -------------------------------------------------------------------------= ------------- #!/bin/bash - VARIANT_bg=3D' =D0=A9 SHT =D1=89 sht ' VARIANT_de=3D' =C3=84 AE =C3=96 OE =C3=9C UE =C3=A4 ae =C3=B6 oe =C3=BC ue ' VARIANT_hr=3D' =C4=90 DJ =C4=91 dj ' VARIANT_vi=3D' =C3=80 A` =C3=81 A'\'' =C3=82 A^ =C3=83 A~ =C3=88 E` =C3=89 E'\'' =C3=8A E^ =C3=8C I` =C3=8D I'\'' =C3=92 O` =C3=93 O'\'' =C3=94 O^ =C3=95 O~ =C3=99 U` =C3=9A U'\'' =C3=9D Y'\'' =C3=A0 a` =C3=A1 a'\'' =C3=A2 a^ =C3=A3 a~ =C3=A8 e` =C3=A9 e'\'' =C3=AA e^ =C3=AC i` =C3=AD i'\'' =C3=B2 o` =C3=B3 o'\'' =C3=B4 o^ =C3=B5 o~ =C3=B9 u` =C3=BA u'\'' =C3=BD y'\'' =C4=82 A( =C4=83 a( =C4=90 DD =C4=91 dd =C4=A8 I~ =C4=A9 i~ =C5=A8 U~ =C5=A9 u~ ' VARIANT1_bg=3D' =D0=AA Y =D1=8A y ' VARIANT1_ua=3D' =D0=98 Y =D0=B8 y ' REPLACE=3D'?' MIME=3Dus-ascii if [ "$FILTERM" =3D out ] then NPOJED=3D else NPOJED=3D1 fi FORMAT=3D HTMLCHAR=3D POPRAWKI=3D for A in $ARG do case "$A" in (1) NPOJED=3D;; (html) FORMAT=3Dhtml;; (htmldec|htmlhex) FORMAT=3Dhtml; HTMLCHAR=3D${A#html};; (tex) FORMAT=3Dtex;; (*) if [ -x "${0%/*}/../aux/argcharset/$A" ] then POPRAWKI=3D${POPRAWKI:+$POPRAWKI | }${0%/*}/../aux/argcharset/$A fi VARIANT=3DVARIANT_$A; APPROX=3D"${!VARIANT} $APPROX" VARIANT=3DVARIANT1_$A; APPROX1=3D"${!VARIANT} $APPROX1" ;; esac done if [ "$POPRAWKI" ] then "$SHELL" -c "$POPRAWKI" else cat fi | case "$FORMAT" in (html) "${0%/*}/../aux/fixmeta" us-ascii | if [ "$HTMLCHAR" ] then "${0%/*}/UTF8-html$HTMLCHAR" else trs -e '\}\[@&<>\] @' \ ${NPOJED:+-e} ${NPOJED:+"$APPROX"} \ -e "$APPROX1" \ ${NPOJED:+-f} ${NPOJED:+"${0%/*}/../aux/UTF8-ascii"} \ -f "${0%/*}/../aux/UTF8-ascii1" \ -e "\300\-\377 ${REPLACE:-?} \200\-\277 \!" | trs -e '@@ @ @& & @< < @> > & & < < > >' fi ;; (tex) trs -e '\}\[@\#$%&\\^_{|}~\] @' \ -f "${0%/*}/../aux/UTF8-tex" \ -e "$APPROX" \ -e "$APPROX1" \ -f "${0%/*}/../aux/UTF8-ascii" \ -f "${0%/*}/../aux/UTF8-ascii1" \ -e "\300\-\377 ${REPLACE:-?} \200\-\277 \!" | trs -e '@@ @ @\# \# @$ $ @% % @& & @\\ \\ @^ ^ @_ _ @{ { @| | @} } @~ ~ \# \\\# $ \\$ % \\% & \\& \\ $\\backslash$ ^ \\^{} _ \\_ { \\{ | $|$ }=20 \\} ~ \\~{}' ;; (*) trs ${NPOJED:+-e} ${NPOJED:+"$APPROX"} \ -e "$APPROX1" \ ${NPOJED:+-f} ${NPOJED:+"${0%/*}/../aux/UTF8-ascii"} \ -f "${0%/*}/../aux/UTF8-ascii1" \ -e "\300\-\377 ${REPLACE:-?} \200\-\277 \!" ;; esac -------------------------------------------------------------------------= ------------- There's bash wizardry in there I can't even begin to fathom. Regards, David. |
From: David N. <dav...@sw...> - 2006-07-12 14:17:46
|
Hi Markus, >> I'm experiencing all kinds of difficulty using the latest svn refdb >> build with LaTeX/BibTeX. 'runbib' will not extract records in BibTeX >> format unless citations are in the previous '\cite{[dbname-]IDcitekey}' >> format. >> > I hardly dare to ask, but are you sure you installed the svn version and > restarted refdbd? I install from custom deb packages so refdbd is stopped and started as part of the debian package upgrade process. My source tree is at revision 81. I checked everything, including the source code changes you made at version 72 to alter the citation format. Finally I remembered some advice you gave recently about checking for multiple running instances of refdbd. Sure enough, I had an extra instance running, probably from a debugging exercise where I was running refdbd in standalone mode. Once stopped the old behaviour went away. Problem solved (he says sheepishly). FWIW, I can confirm the new citation format is working correctly. Regards, David. |
From: Damien J. D. <D.J...@cs...> - 2006-07-12 13:10:26
|
> RefDB uses bibliography styles even for generating LaTeX bibliographies. Oh. Right! > should be able to generate at least lowercased (except the first char), I don't think I'd be able to use that because presumably it would lowercase proper nouns, acronyms and subtitle headings. > all-caps, and mixed case output. The latter is fine except, as before, when your article title capitalises verbs and improper nouns. I'm happy to do it this way though, I'll just make sure no capitalised verbs and improper nouns get into my database and hope not to encounter citation styles that require them. Cheers Damien |
From: Markus H. <mar...@mh...> - 2006-07-12 12:30:12
|
Damien Jade Duff <D.J...@cs...> was heard to say: > trying to second guess bibtex, just set the whole title as verbatim - e.g. > > TITLE = {{Why AM and EURISKO appear to work}}, > > Folks who use latex would probably have to turn non-proper nouns that > are capitalised into lowercase before putting them in to RefDB and hope > to seldom encounter a citation style that requires these things to be > reproduced verbatim from the original article (I don't think I have yet). > RefDB uses bibliography styles even for generating LaTeX bibliographies. These styles currently only take care of the capitalization of titles. IIRC you get the best results if you add your data using mixed-case and then pick the proper style for output. If we combine that with the curly brackets as shown above, you should be able to generate at least lowercased (except the first char), all-caps, and mixed case output. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-12 12:20:19
|
Hi David, David Nebauer <dav...@sw...> was heard to say: > Use of this (or a similar) tool would result in the much more > satisfactory, and easy to remember, default citekey of 'Hassler2006'. > It should be a fairly simple to add this additional conversion step. > It is. Unfortuntately the konwert sources are a bit hard on the eyes because they're in Polish but I'll try to steal from that anyway. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-12 12:13:26
|
Hi David, David Nebauer <dav...@sw...> was heard to say: > I'm experiencing all kinds of difficulty using the latest svn refdb > build with LaTeX/BibTeX. 'runbib' will not extract records in BibTeX > format unless citations are in the previous '\cite{[dbname-]IDcitekey}' > format. Using the new citation format on my system results in '999:0 > retrieved:0 failed'. Would you mind checking on your system? If the > new format works for you it must be a problem at my end and I'll work up > a test case. I hardly dare to ask, but are you sure you installed the svn version and restarted refdbd? I just checked the svn code, and all changes are in place. The current svn code certainly does not look for "-ID" but for ":" as a database separator. I'm sure that the new format works on my FreeBSD box. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Damien J. D. <D.J...@cs...> - 2006-07-12 11:28:02
|
Markus Hoenicka wrote: > David Nebauer writes: > > > TITLE = {Why {AM} and {EURISKO} appear to work}, > > > > Alternately, you could save the title in plain Unicode with that > > capitalisation and refdb's BibTeX output filter would wrap any > > abnormally capitalised words in braces. That way your reference can be > > used in either DocBook XML/SGML or LaTeX output. > > > > If it is a matter of wrapping uppercased words in curly brackets, this > certainly doable. Is it likely to have uppercased words in bibtex data > which are *not* supposed to be rendered in all-caps? > > regards, > Markus > Gidday I imagine only where the style demands it (e.g. APA). Some journals have titles with pretty much everything (both proper and non-proper nouns and subitles and acronyms) capitalised. On the other hand, APA requires nouns that aren't proper to go lower case when citing - proper nouns and subitles can be uppercase. I don't know how this is managed in Docbook styles or Endnote etc, but presumably titles are used verbatim because I can't imagine any logic funky enough to figure out whether a word is a proper noun or not without some extra markup. If we're going to export all capitals to bibtex verbatim sans markup then I think we might as well, as a first approximation, rather than trying to second guess bibtex, just set the whole title as verbatim - e.g. TITLE = {{Why AM and EURISKO appear to work}}, Folks who use latex would probably have to turn non-proper nouns that are capitalised into lowercase before putting them in to RefDB and hope to seldom encounter a citation style that requires these things to be reproduced verbatim from the original article (I don't think I have yet). The same may possibly apply to BOOKTITLE, JOURNAL, SERIES, SCHOOL, PUBLISHER, INSTITUTION etc. My 2nd pennies worth. Regards Damien |
From: David N. <dav...@sw...> - 2006-07-12 09:43:15
|
Hi Markus, > > One interesting consequence of this is that author names may contain= =20 > > non-ascii characters. If, when new references are added to refdb, t= here=20 > > is no citation key specified, the citekey is constructed by mangling= =20 > > primary author surname and year. If citekey is restricted to ascii = > > characters then non-ascii author surname characters would have to be= =20 > > stripped or converted (e.g., =E4 -> a, =DF -> ss). > > Currently non-ascii characters are simply stripped. You always have > the option to specify a citation key explicitly when adding a > reference, using any reasonable translation of the foreign characters > to ascii. > =20 I'd like to focus on this point again. I personally allow refdb to=20 generate the citekey for me, mainly because it will automatically append = 'a', 'b', etc. if there is danger of duplication. Automatically=20 stripping non-ascii characters from authors with foreign characters will = lead to some unusual results. A recent publication from our old=20 workhorse 'H=E4=DFler' might produce the citekey 'Hler2006'. There are tools around which attempt to convert sensibly from unicode to = ascii. Here is an example using the tool 'konwert': -------------------------------------------------------------------------= -------------- $ cat name H=E4=DFler, G=FCnter $ cat name | konwert UTF8-ascii Hassler, Gunter $ -------------------------------------------------------------------------= -------------- Use of this (or a similar) tool would result in the much more=20 satisfactory, and easy to remember, default citekey of 'Hassler2006'. =20 It should be a fairly simple to add this additional conversion step. Regards, David. |
From: David N. <dav...@sw...> - 2006-07-12 07:56:52
|
Hi Markus, > Markus Hoenicka writes: > > \cite{citekey} > > \cite{dbname:citekey} > > Just to let y'all know that the current Subversion version supports > the above mentioned citation format in LaTeX documents. > I'm experiencing all kinds of difficulty using the latest svn refdb build with LaTeX/BibTeX. 'runbib' will not extract records in BibTeX format unless citations are in the previous '\cite{[dbname-]IDcitekey}' format. Using the new citation format on my system results in '999:0 retrieved:0 failed'. Would you mind checking on your system? If the new format works for you it must be a problem at my end and I'll work up a test case. Regards, David. |
From: Markus H. <mar...@mh...> - 2006-07-11 21:26:08
|
David Nebauer writes: > > TITLE = {Why {AM} and {EURISKO} appear to work}, > > Alternately, you could save the title in plain Unicode with that > capitalisation and refdb's BibTeX output filter would wrap any > abnormally capitalised words in braces. That way your reference can be > used in either DocBook XML/SGML or LaTeX output. > If it is a matter of wrapping uppercased words in curly brackets, this certainly doable. Is it likely to have uppercased words in bibtex data which are *not* supposed to be rendered in all-caps? regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: David N. <dav...@sw...> - 2006-07-11 19:51:35
|
Hi Damien, Damien Jade Duff wrote: > I'd like to see the following kind of latex markup reproduced (not > stripped entirely): > TITLE = {Why {AM} and {EURISKO} appear to work}, Alternately, you could save the title in plain Unicode with that capitalisation and refdb's BibTeX output filter would wrap any abnormally capitalised words in braces. That way your reference can be used in either DocBook XML/SGML or LaTeX output. Regards, David. |
From: Damien J. D. <D.J...@cs...> - 2006-07-11 17:57:38
|
Gidday As a novice latex user it seems okay; I can't envisage any difficulties. Probably jumping the gun a bit here, but I'd like to see the following kind of latex markup reproduced (not stripped entirely): BIBTEX: TITLE = {Why {AM} and {EURISKO} appear to work}, Current RIS: TI - Why {AM} and {EURISKO} appear to work Current RISX: <title type="full">Why {AM} and {EURISKO} appear to work</title> The reason is that bibtex will capitalise your bibliography according to the current bibliography style (generally with a single leading capital), and you need to inform bibtex that certain passages are to be formatted verbatim. Other than that, I can't envisage the need for keeping latex markup - I've been manually stripping everything else. I enter most of my references using risx (I use bibtex and RIS where provided but usually edit them manually before submitting them to RefDB), and I don't appear to have any entites coming back (probably because I don't know how to use entities). Except I think entities are used in some of the URLs - I don't know if they're necessary or not, they're just there. Peace Damien David Nebauer wrote: > Hi Markus, > > >>I take from this discussion: >> >>1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such >>code) which strips markup like boldface, superscript etc. and translates >>foreign characters entered as LaTeX constructs to their Unicode equivalents. >> >>2) Modify the code to prevent XML entities to show up in LaTeX output. >> >>3) Add code to escape the LaTeX command characters in the LaTeX output. >> >>The second point is a bit tricky. References imported from RIS usually do not >>contain entities, but references imported from risx are likely to do. Either I >>convert these entities during import, or I remove them during LaTeX export. The >>former seems cleaner to me, and I think this is what you had in mind. > > > Yes, in my view the storage format is Unicode without markup: > > > BibTeX ------- ---------> DocBook > | | > | | > RIS ---------+--> STORAGE ---- > | (Unicode) | > | | > RISX --------- ---------> LaTeX > > > > Whatever the input format, all references end up in the same storage > format (Unicode sans markup). This would require stripping out XML > entities and LaTeX markup. With luck you can use existing tools to do > this. The stored references can then be output in either DocBook- or > LaTeX-compatible format. This seems to be to be an elegant way of > dealing with the mishmash of input and output formats. > > Regards, > David. > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Refdb-devel mailing list > Ref...@li... > https://lists.sourceforge.net/lists/listinfo/refdb-devel |
From: David N. <dav...@sw...> - 2006-07-11 16:37:20
|
Hi Markus, > I take from this discussion: > > 1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such > code) which strips markup like boldface, superscript etc. and translates > foreign characters entered as LaTeX constructs to their Unicode equivalents. > > 2) Modify the code to prevent XML entities to show up in LaTeX output. > > 3) Add code to escape the LaTeX command characters in the LaTeX output. > > The second point is a bit tricky. References imported from RIS usually do not > contain entities, but references imported from risx are likely to do. Either I > convert these entities during import, or I remove them during LaTeX export. The > former seems cleaner to me, and I think this is what you had in mind. Yes, in my view the storage format is Unicode without markup: BibTeX ------- ---------> DocBook | | | | RIS ---------+--> STORAGE ---- | (Unicode) | | | RISX --------- ---------> LaTeX Whatever the input format, all references end up in the same storage format (Unicode sans markup). This would require stripping out XML entities and LaTeX markup. With luck you can use existing tools to do this. The stored references can then be output in either DocBook- or LaTeX-compatible format. This seems to be to be an elegant way of dealing with the mishmash of input and output formats. Regards, David. |
From: Markus H. <mar...@mh...> - 2006-07-11 15:54:02
|
Hi David, David Nebauer <dav...@sw...> was heard to say: > If you instead go with my idea to store as unicode you don't need to > know anything about the eventual output format when you store the > reference. Indeed, the user doesn't have to know at that time. The > same references can be used for either DocBook or LaTeX. You can easily > add in other output formats later and all you have to do is write > another output filter. > I'm all with you here. I just wanted to get opinions from real-world LaTeX users whether or not it makes sense to preserve the markup. > Your point is true but I say it is a small loss. Using LaTeX formatting > codes means your references can never be used for any other format > without hacking in some kind of conversion. RefDB is designed to be a > long-term reference database enabling the contained references to be > used all kinds of interesting ways. Use of format-specific markup > limits your future choices. As a minor example it prevents their use in > DocBook documents. True, but I assumed that only those might want to keep the markup who use RefDB solely for LaTeX. > > Another issue is the ability of library and indexing systems to handle > such formatting complexities as superscripting, subscripting and font > changes. You know far more about such things than I, but I would guess > even the most complex article title is reduced to canonical ascii for > storage in many cataloguing systems. I presume the algorithms for such > simplification are fairly predictable. Anyone searching for the journal > article by title would be easily able to predict the stored character > sequence. I would endeavour to suggest the simplified form of title > would be entirely acceptable in any kind of bibliography. > > In any event, how would such a complex title be stored in plain ascii? > Or Unicode? Or even XML (imagine the attempt to use MathML in a title > string!)? > The database which I use mostly (www.pubmed.org) indeed "ascii-izes" the titles. The tagged format uses plain ASCII with a pretty crude transliteration, whereas the XML format uses Unicode. > As mentioned above, I am unconvinced about the utility of keeping > boldface, italics, superscript and subscript-type markup. As for > foreign characters, almost any foreign character can be represented in I'm afraid I didn't express my thoughts very well here. What I was talking about is that a reference imported from bibtex may contain markup like "Title with an {\bf emphasized} word" It is not sufficient to escape characters but we have to remove the "{\bf " and the "}" sequences before we import the reference. This is what one of the scripts that you pointed me to as well as tex2mail do. > To allow attribute values to contain both single and double quotes, > the apostrophe or single-quote character (') may be represented as > "'", and the double-quote character (") as """. > > > The relevant portion states, "The right angle bracket (>) *may* be > represented using the string '>'," but "*must*, for compatibility, be > escaped using '>' or a character reference when it appears in the > string ']]>'." (emphases mine) > > The last paragraph in the quote refers to straight single and double > quotation mark entities. > But it appears to talk about attribute values. XML output from RefDB never puts quotes into attribute values, so we're left with &,<,>. > It worked for me "out of the box". I installed the 'ucs' package > (apt-get install latex-ucs), added those two lines to the preamble, ran > 'latex test' and, presto, gloriously rendered unicode. > This is great news indeed. I will have to mention this in the manual I take from this discussion: 1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such code) which strips markup like boldface, superscript etc. and translates foreign characters entered as LaTeX constructs to their Unicode equivalents. 2) Modify the code to prevent XML entities to show up in LaTeX output. 3) Add code to escape the LaTeX command characters in the LaTeX output. The second point is a bit tricky. References imported from RIS usually do not contain entities, but references imported from risx are likely to do. Either I convert these entities during import, or I remove them during LaTeX export. The former seems cleaner to me, and I think this is what you had in mind. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-10 21:46:40
|
Hi David, David Nebauer writes: > I've also been thinking further about this. It seems to me the issu= e is=20 > what formats to use when inputting and outputting: >=20 >=20 > ------------> output for XML > input data | > -------------> STORAGE -------| > FORMAT | > ------------> output for BibTeX/LaT= eX >=20 >=20 > A simple scheme seems to me to consist of the following: > - input and store all data as unicode > - during output perform needed translations >=20 This may be a necessary tradeoff. So far, the LaTeX diehards were able to use many LaTeX constructs in e.g. the author names or the titles (think of italics, superscripts, or subscripts which are not uncommon in e.g. physics or chemistry papers). This may be a pain in the neck to search afterwards, but at least you could do it. If we follow the simplified scheme you outline above, you can no longer use these LaTeX hacks. I'd like to hear from the LaTeX users (I'm not one of them currently) how important it is to include LaTeX markup into the data.=20= > The *minimum* necessary translations needed are: >=20 > 1. BibTeX/LaTeX >=20 > Convert the following control characters to appropriate escape seque= nces: > # $ % & ~ =5F ^ \ { } >=20 I'm afraid there's more to it. We have to remove lots of commands like the above mentioned boldface, italics, superscript, subscript and such. These commands do not make any sense in the context of SGML/XML. We also have to translate foreign characters (\"{a}, {\ss} an= d similar constructs. Part of this translation can be achieved through tex2mail, although it does not seem to create UTF-8 (but see below). > 2. XML/SGML >=20 > Convert the two illegal characters to their respective entities: > & < >=20 > While not illegal, it is customary also to convert the following cha= racters: > > ' " I was under the impression that &, <, and > always have to be replaced as these are part of the XML markup. Why and to what would you like to convert ' and "=3F >=20 >=20 > The question then arises as to whether any other translation is=20 > necessary and/or desirable. In theory no other translation is=20 > necessary. LaTeX can process "raw" unicode using the 'ucs' package.= =20 This is good news. My only LaTeX book dates back to 1999, and Unicode does not seem to be mentioned. The transformations would be so much simpler if we didn't have to create LaTeX commands to represent foreign= or special characters. > The XML standard states, "Legal characters are tab, carriage return,= =20 > line feed, and the legal graphic characters of Unicode and ISO/IEC 1= 0646". >=20 This is pretty much what RefDB currently outputs. > Having said that, it may be desirable to translate non-ascii charact= ers=20 > into decimal numeric character references (e.g., 'â') for XML o= r,=20 > for LaTeX, appropriate escape sequences. Perhaps this could be opti= onal=3F >=20 I think it is common to leave the non-ascii characters in the xml file and use the proper charset declaration (UTF-8 by default). IMHO character entities do not have any advantage over UTF-8. I'm not sure about LaTeX output. How hard is it to make the use of the ucs package mandatory for RefDB users=3F Once it is installed, it is as simple as inserting one line at the top of your document, isn't it=3F > One interesting consequence of this is that author names may contain= =20 > non-ascii characters. If, when new references are added to refdb, t= here=20 > is no citation key specified, the citekey is constructed by mangling= =20 > primary author surname and year. If citekey is restricted to ascii=20= > characters then non-ascii author surname characters would have to be= =20 > stripped or converted (e.g., =E4 -> a, =DF -> ss). >=20 Currently non-ascii characters are simply stripped. You always have the option to specify a citation key explicitly when adding a reference, using any reasonable translation of the foreign characters to ascii. > escapechars converts non-ASCII (UTF-8, Latin-1 etc.) files to= =20 > ASCII with XML or TeX > escape sequences > latex2utf8txt converts LaTeX files to UTF-8 text, removes line=20= > breaks from paragraphs Thanks for the pointers. I've downloaded these scripts and will give them a try. If the latter works as advertized, it could be used as a post-processing filter after bib2ris (or, if I'll ever end up having too much time on my hands, I could reimplement bib2ris in Perl and integrate the conversion code). The former is a bit trickier as the conversion should run in refdbd. However, the script looks simple enough that I might be able to recode the algorithm in C. regards, Markus --=20 Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-08 22:23:16
|
Markus Hoenicka writes: > \cite{citekey} > \cite{dbname:citekey} > Just to let y'all know that the current Subversion version supports the above mentioned citation format in LaTeX documents. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-08 22:21:12
|
David Nebauer writes: > There's something else. You may recall some time ago all the trouble > taken to ensure entities such as — and & are preserved in > database reference entries and subsequently then preserved throughout > DocBook processing. Many of my references include entities in document > titles. Well, those entities are now appearing in the bibtex entries > created by runbib. As you noted, the raw ampersands choke LaTeX. Is > there any way of converting those xml-safe entities to LaTeX equivalents > as runbib exports them? In the case of '—' that would be '---'. > I thought about this a bit more. I'm afraid this is going to get far more complex than I thought in the first place. We need to: - replace XML entities that stem from risx documents or which were deliberately used in RIS data. E.g. '—' -> '---' - backslash-escape LaTeX command characters unless, and that's the catch, they are used as LaTeX commands. A LaTeX-only user may rightfully expect e.g. author names like 'H\"{a}\{ss}ler' (as imported from a bibtex file) to be processed correctly, or e.g. '{\bf emphasized}' words in titles. refdbd would have to acquire a thorough knowledge of LaTeX commands to cope with this. - translate foreign letters and letters with diacritics to their TeX equivalents from, and that's the catch here, any supported character encoding. The same TeX representation of such a letter may be encoded as a variety of one to three-byte sequences in different uni- or multibyte character sets. Is anyone aware of a library or a tool that implements these transformations? I'm only aware of tex2mail which does the reverse. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-08 14:35:03
|
David Nebauer <dav...@sw...> was heard to say: > From one of the many guides on LaTeX: > > You can use any of the standard characters that you find on your > keyboard, except the following 10 symbols: > { } % & $ # _ ^ ~ \ > These symbols may only occur in LATEX commands. > > > The first seven of the characters shown above are included as literals > by escaping them with a backslash. Thanks for that. > There's something else. You may recall some time ago all the trouble > taken to ensure entities such as — and & are preserved in > database reference entries and subsequently then preserved throughout > DocBook processing. Many of my references include entities in document > titles. Well, those entities are now appearing in the bibtex entries > created by runbib. As you noted, the raw ampersands choke LaTeX. Is > there any way of converting those xml-safe entities to LaTeX equivalents > as runbib exports them? In the case of '—' that would be '---'. > I don't think this is too hard. I'll have to adapt the character replacement routine which currently converts the input to XML-safe strings to replace the entities back to something that LaTeX can grok. Most of the work is to compile a table that defines the required translations. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-08 14:21:51
|
David Nebauer <dav...@sw...> was heard to say: > IIRC, it is illegal to include colons in citation keys. I seem to > recall they are automatically stripped out. It is currently possible > to include colons in database names. But, if you made it illegal to > include a colon in a database name that gives you a ready-made > delimiter to use in citations. > > > I think that suggestion makes more sense. > Well, if I understand *that* correctly, you suggest to use these forms in LaTeX documents: \cite{citekey} \cite{dbname:citekey} This is more compact than the "-ID" kludge that I currently use. The only downside, if at all, is that this citation syntax is different from the one used in the full style in SGML/XML documents (where, as noted previously, we must not use a colon). However, this might only confuse those who work with both LaTeX and SGML/XML. Unless someone else has objections, I'll implement your suggestion shortly. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: Markus H. <mar...@mh...> - 2006-07-08 12:14:07
|
David Nebauer <dav...@sw...> was heard to say: > I'm afraid the current scheme is unusable. It is possible for citation > keys to contains hyphens -- in fact, the default key for a hyphenated > author surname contains a hyphen. Try using a hyphenated citation key > and watch the fun. Even better, combine a database name with the > hyphenated citation key -- that introduces another hyphen. Even more fun. > That's why the code does not rely on the hyphen as a separator, but on the sequences "ID" and "-ID", which are checked for in this particular order from left to right. Unless a citation is malformed, you can have as many hyphens or even "-ID" sequences in your citation keys as you like: \cite{IDMILLER-IDRUM-2005} ** \cite{dbname-IDMILLER-IDRUM-2005} *** The '*' mark the database name prefix separator in both cases. Unless I'm dense this is foolproof as far as citation keys are concerned. Trouble may arise when you use database names like "IDBASE" or "DATA-IDBASE". RefDB would have to reject these names in order to avoid trouble. > refdb allows you to create databases whose names contain hyphens. I > haven't tried including a hyphenated database with a hyphenated citation > key in the one citation -- I'm too scared. Interestingly, while I can > create a database with a hyphenated name I can't delete it (at least > with an sqlite backend) -- the deletedb operation fails. I'll have to investigate this. SQLite databases are deleted on the filesystem level by using an unlink() system call - I can't imagine why that would fail with a hyphen in the filename. > > IIRC, it is illegal to include colons in citation keys. I seem to > recall they are automatically stripped out. It is currently possible to > include hyphens in database names. But, if you made it illegal to > include a hyphen in a database name that gives you a ready-made > delimiter to use in citations. > Yes, colons are not allowed in IDREF attributes (xref linkend). And yes, refdbd indeed strips out colons in citation keys to avoid creating invalid output. The remainder of your suggestion is less clear to me. If I understand correctly, you suggest to use the hyphen under the assumption that database names never have hyphens (this could indeed be enforced). But how do you distinguish between a citation key prefixed with a database name and a sole citation key containing a hyphen? As in: dbname-citekey cite-key regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: David N. <dav...@sw...> - 2006-07-08 11:24:53
|
Hi Markus, > In order to > support multiple databases I had to revert this to the original format where > the citation key is prefixed with "ID" or "dbname-ID". > > I'm open for suggestions if you know a better way to safely distinguish database > names from the citation key proper. > I'm afraid the current scheme is unusable. It is possible for citation keys to contains hyphens -- in fact, the default key for a hyphenated author surname contains a hyphen. Try using a hyphenated citation key and watch the fun. Even better, combine a database name with the hyphenated citation key -- that introduces another hyphen. Even more fun. refdb allows you to create databases whose names contain hyphens. I haven't tried including a hyphenated database with a hyphenated citation key in the one citation -- I'm too scared. Interestingly, while I can create a database with a hyphenated name I can't delete it (at least with an sqlite backend) -- the deletedb operation fails. IIRC, it is illegal to include colons in citation keys. I seem to recall they are automatically stripped out. It is currently possible to include hyphens in database names. But, if you made it illegal to include a hyphen in a database name that gives you a ready-made delimiter to use in citations. Regards, David. |
From: Markus H. <mar...@mh...> - 2006-07-08 09:28:29
|
David Nebauer <dav...@sw...> was heard to say: > This is a test of the RefDB application used in conjunction with > vim-latexsuite. Here is a reference \cite{Agnew0}. Here is another > \cite{Weckert0}. > Before my latest patch, these citations probably did work. I've implemented this version a while ago to move to a citation syntax familiar to LaTeX users, i.e. use the citation key in curly brackets. However, this does not allow a safe distinction of citation keys with and without a database part. In order to support multiple databases I had to revert this to the original format where the citation key is prefixed with "ID" or "dbname-ID". The following is supposed to work: This is a test of the RefDB application used in conjunction with vim-latexsuite. Here is a reference \cite{IDAgnew0}. Here is another \cite{otherdb-IDWeckert0}. I'm open for suggestions if you know a better way to safely distinguish database names from the citation key proper. While testing the code I came across a problem with bibliography entries which contain ampersands. The ampersand seems to be a control character in LaTeX/bibtex and needs to be escaped in the bibtex output. I'll look into this shortly. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |
From: David N. <dav...@sw...> - 2006-07-08 05:25:36
|
Hi Markus, > I've fixed refdbd to support multiple databases in bibtex too. Now it is not working for me at all. runbib is retrieving no information. Here is an annotated transcript: ------------------------------------------------------------------------------------------ >> Here are the tex-related files << $ ls -rw-r--r-- 1 david david 155 2006-07-08 14:13 test.aux -rw-r--r-- 1 david david 275 2006-07-08 14:06 test.bbl -rw-r--r-- 1 david david 1 2006-07-08 14:22 test.bib -rw-r--r-- 1 david david 914 2006-07-08 14:06 test.blg -rw-r--r-- 1 david david 696 2006-07-08 14:13 test.dvi -rw-r--r-- 1 david david 3017 2006-07-08 14:13 test.log -rw-r--r-- 1 david david 480 2006-07-08 14:06 test.tex -rw-r--r-- 1 david david 178 2006-07-06 15:36 test.tex~ >> Here are the references in the tex document << $ cat test.tex % File: test.tex % Created: Thu Jul 06 03:00 PM 2006 C % Last Change: Thu Jul 06 03:00 PM 2006 C % \documentclass[a4paper]{article} \usepackage{natbib} \author{David Nebauer} \title{Test Document} \begin{document} \maketitle \section{Introduction} This is a test of the RefDB application used in conjunction with vim-latexsuite. Here is a reference \cite{Agnew0}. Here is another \cite{Weckert0}. \bibliographystyle{plainnat} \bibliography{test} \end{document} >> Let me prove the references exist << $ refdbc -C getref -d refs_computing :CK:=Agnew0 ID*:17 (2000) Key: Agnew0 Agnew,Grace Government Access to Encryption Keys 999:1 retrieved:0 failed $ refdbc -C getref -d refs_computing :CK:=Weckert0 ID*:23 (1997) Key: Weckert0 Weckert,J. Intellectual Property Rights and Computer Software Business Ethics: A European Review 6(2):102-109 999:1 retrieved:0 failed >> Let me show the test.aux file includes those references << $ cat test.aux \relax \citation{Agnew0} \citation{Weckert0} \bibstyle{plainnat} \bibdata{test} \@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}} >> Here is the runbib command that returns no results << $ runbib -d refs_computing -S bibtex-full -t bibtex test 999:0 retrieved:0 failed >> The corresponding refdbib command also returns nothing << $ refdbib -d refs_computing -S bibtex-full -t bibtex test.aux > test.bib 999:0 retrieved:0 failed >> The test.bib file remains empty! << $ cat test.bib $ ------------------------------------------------------------------------------------------ I ran refdbd standalone at log setting 7. Here is the feedback generated when running the runbib or refdbib command as above: ------------------------------------------------------------------------------------------ adding client 127.0.0.1 on fd 5 server waiting n_max_fd=5 try to read from client serving client on fd 5 with protocol version 4 012-58-51-27 send pseudo-random string to client parent removing client on fd 5 server waiting n_max_fd=4 gettexbib -u david -w xxxxxxxxxxxxxxxxxxxxxxxxxxx -d refs_computing -s bibtex-full 19 dbi is up localhost david daviduser refs_computing sqlite /var/lib/refdb/db refdb connected to database server using database: refdb Main database looks ok: refdb localhost david daviduser refs_computing sqlite /var/lib/refdb/db refs_computing SELECT meta_app,meta_type,meta_dbversion from t_meta connected to database server using database: refs_computing command processing done, finish dialog now child finished client on fd 5 child exited with code 0 server waiting n_max_fd=4 ------------------------------------------------------------------------------------------ I'm not familiar with the 'gettexbib' command. The '19' initially looked a little strange but I had a quick dive into refdbib.c and it looks like that is a legitimate parameter -- the command buffer string length. Adding the database name to each reference as per the manual makes no difference. Regards, David. P.S. The refdb-users lists is rejecting my posts sporadically with the claim my ISP is not providing a postmaster address, so I'm copying all my posts to your personal email address. |
From: Markus H. <mar...@mh...> - 2006-07-07 23:12:47
|
Markus Hoenicka writes: > I didn't try lately, but the reverse might be true. Once upon a time the > DocBook/TEI code also allowed using more than one database. I'll check. > I've checked the situation with DocBook and TEI. Both support database names in citations using the full format. refdbxp does not support database names, presumably because the short citation format has no means to encode a database name. I've fixed refdbd to support multiple databases in bibtex too. regards, Markus -- Markus Hoenicka mar...@ca... (Spam-protected email: replace the quadrupeds with "mhoenicka") http://www.mhoenicka.de |