Thread: [Refdb-devel] latex bibliographies with multiple databases

Status: Beta

Brought to you by: mhoenicka

refdb-devel

[Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-06 21:59:35

David Nebauer writes:
 > When I specify no database in the document -- with all references from 
 > one database -- and specify that database as a runbib parameter, there 
 > is no problem.  If, however, I use the second method of specifying 
 > '\cite{<database>-<reference>}' the process fails.
 > 

I'll set up a test case and see what happens. I recall it might be
necessary to specify a default database anyway (i.e. use the -d switch
of runbib) even if you specify a database in each citation. Does the
problem persist if you set a default database?

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 07:56:52

Hi Markus,

> Markus Hoenicka writes:
>  > \cite{citekey}
>  > \cite{dbname:citekey}
>
> Just to let y'all know that the current Subversion version supports
> the above mentioned citation format in LaTeX documents.
>   

I'm experiencing all kinds of difficulty using the latest svn refdb 
build with LaTeX/BibTeX.  'runbib' will not extract records in BibTeX 
format unless citations are in the previous '\cite{[dbname-]IDcitekey}' 
format.  Using the new citation format on my system results in '999:0 
retrieved:0 failed'.  Would you mind checking on your system?  If the 
new format works for you it must be a problem at my end and I'll work up 
a test case.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 09:43:15

Hi Markus,

>  > One interesting consequence of this is that author names may contain=
=20
>  > non-ascii characters.  If, when new references are added to refdb, t=
here=20
>  > is no citation key specified, the citekey is constructed by mangling=
=20
>  > primary author surname and year.  If citekey is restricted to ascii =

>  > characters then non-ascii author surname characters would have to be=
=20
>  > stripped or converted (e.g., =E4 -> a, =DF -> ss).
>
> Currently non-ascii characters are simply stripped. You always have
> the option to specify a citation key explicitly when adding a
> reference, using any reasonable translation of the foreign characters
> to ascii.
>  =20

I'd like to focus on this point again.  I personally allow refdb to=20
generate the citekey for me, mainly because it will automatically append =

'a', 'b', etc. if there is danger of duplication.  Automatically=20
stripping non-ascii characters from authors with foreign characters will =

lead to some unusual results.  A recent publication from our old=20
workhorse 'H=E4=DFler' might produce the citekey 'Hler2006'.

There are tools around which attempt to convert sensibly from unicode to =

ascii.  Here is an example using the tool 'konwert':
-------------------------------------------------------------------------=
--------------
$ cat name
H=E4=DFler, G=FCnter
$ cat name | konwert UTF8-ascii
Hassler, Gunter
$
-------------------------------------------------------------------------=
--------------

Use of this (or a similar) tool would result in the much more=20
satisfactory, and easy to remember,  default citekey of 'Hassler2006'. =20
It should be a fairly simple to add this additional conversion step.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-08-16 21:42:34

Hi David,

David Nebauer writes:
 > I'd like to focus on this point again.  I personally allow refdb to=20=

 > generate the citekey for me, mainly because it will automatically ap=
pend=20
 > 'a', 'b', etc. if there is danger of duplication.  Automatically=20
 > stripping non-ascii characters from authors with foreign characters =
will=20
 > lead to some unusual results.  A recent publication from our old=20
 > workhorse 'H=E4=DFler' might produce the citekey 'Hler2006'.
 >=20

I've tried to resolve this problem by running the citekeys through an
iconv conversion (from UTF-8 to ASCII with transliteration switched
on). Only then invalid characters are stripped from the strings. I
hope this will improve the automatically created citation keys.

iconv uses a latex-style transliteration of umlauts and other
non-ASCII characters. E.g. our beloved 'H=E4=DFler' is converted to
'H"assler'. RefDB has to strip the '"' from the latter as it must not
appear in XML attribute values, hence you'll end up with 'Hassler2006'
instead of the abovementioned 'Hler2006'. It is still not the correct
German transliteration, which would call for 'Haessler2006', but I
think we're close enough without having to hand-code a boatload of
special cases.

regards,
Markus

--=20
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 12:13:26

Hi David,

David Nebauer <dav...@sw...> was heard to say:

> I'm experiencing all kinds of difficulty using the latest svn refdb
> build with LaTeX/BibTeX.  'runbib' will not extract records in BibTeX
> format unless citations are in the previous '\cite{[dbname-]IDcitekey}'
> format.  Using the new citation format on my system results in '999:0
> retrieved:0 failed'.  Would you mind checking on your system?  If the
> new format works for you it must be a problem at my end and I'll work up
> a test case.

I hardly dare to ask, but are you sure you installed the svn version and
restarted refdbd? I just checked the svn code, and all changes are in place.
The current svn code certainly does not look for "-ID" but for ":" as a
database separator. I'm sure that the new format works on my FreeBSD box.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 14:17:46

Hi Markus,

>> I'm experiencing all kinds of difficulty using the latest svn refdb
>> build with LaTeX/BibTeX.  'runbib' will not extract records in BibTeX
>> format unless citations are in the previous '\cite{[dbname-]IDcitekey}'
>> format.
>>     
> I hardly dare to ask, but are you sure you installed the svn version and
> restarted refdbd?

I install from custom deb packages so refdbd is stopped and started as 
part of the debian package upgrade process.  My source tree is at 
revision 81.

I checked everything, including the source code changes you made at 
version 72 to alter the citation format.  Finally I remembered some 
advice you gave recently about checking for multiple running instances 
of refdbd.  Sure enough, I had an extra instance running, probably from 
a debugging exercise where I was running refdbd in standalone mode.  
Once stopped the old behaviour went away.

Problem solved (he says sheepishly).

FWIW, I can confirm the new citation format is working correctly.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 12:20:19

Hi David,

David Nebauer <dav...@sw...> was heard to say:


> Use of this (or a similar) tool would result in the much more
> satisfactory, and easy to remember,  default citekey of 'Hassler2006'.
> It should be a fairly simple to add this additional conversion step.
>

It is. Unfortuntately the konwert sources are a bit hard on the eyes because
they're in Polish but I'll try to steal from that anyway.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 14:22:59

Hi Markus,

> It is. Unfortuntately the konwert sources are a bit hard on the eyes be=
cause
> they're in Polish but I'll try to steal from that anyway.

A lot of the heavy lifting is done by the (executable) filters. On my=20
system they live in '/usr/share/konwert/filters/'.

UTF8-ascii is a bash script:
-------------------------------------------------------------------------=
-------------
#!/bin/bash -

VARIANT_bg=3D'
=D0=A9 SHT
=D1=89 sht
' VARIANT_de=3D'
=C3=84 AE
=C3=96 OE
=C3=9C UE
=C3=A4 ae
=C3=B6 oe
=C3=BC ue
' VARIANT_hr=3D'
=C4=90 DJ
=C4=91 dj
' VARIANT_vi=3D'
=C3=80 A`
=C3=81 A'\''
=C3=82 A^
=C3=83 A~
=C3=88 E`
=C3=89 E'\''
=C3=8A E^
=C3=8C I`
=C3=8D I'\''
=C3=92 O`
=C3=93 O'\''
=C3=94 O^
=C3=95 O~
=C3=99 U`
=C3=9A U'\''
=C3=9D Y'\''
=C3=A0 a`
=C3=A1 a'\''
=C3=A2 a^
=C3=A3 a~
=C3=A8 e`
=C3=A9 e'\''
=C3=AA e^
=C3=AC i`
=C3=AD i'\''
=C3=B2 o`
=C3=B3 o'\''
=C3=B4 o^
=C3=B5 o~
=C3=B9 u`
=C3=BA u'\''
=C3=BD y'\''
=C4=82 A(
=C4=83 a(
=C4=90 DD
=C4=91 dd
=C4=A8 I~
=C4=A9 i~
=C5=A8 U~
=C5=A9 u~
' VARIANT1_bg=3D'
=D0=AA Y
=D1=8A y
' VARIANT1_ua=3D'
=D0=98 Y
=D0=B8 y
' REPLACE=3D'?' MIME=3Dus-ascii

if [ "$FILTERM" =3D out ]
then
NPOJED=3D
else
NPOJED=3D1
fi
FORMAT=3D
HTMLCHAR=3D
POPRAWKI=3D
for A in $ARG
do
case "$A" in
(1) NPOJED=3D;;
(html) FORMAT=3Dhtml;;
(htmldec|htmlhex) FORMAT=3Dhtml; HTMLCHAR=3D${A#html};;
(tex) FORMAT=3Dtex;;
(*)
if [ -x "${0%/*}/../aux/argcharset/$A" ]
then
POPRAWKI=3D${POPRAWKI:+$POPRAWKI | }${0%/*}/../aux/argcharset/$A
fi
VARIANT=3DVARIANT_$A; APPROX=3D"${!VARIANT} $APPROX"
VARIANT=3DVARIANT1_$A; APPROX1=3D"${!VARIANT} $APPROX1"
;;
esac
done

if [ "$POPRAWKI" ]
then
"$SHELL" -c "$POPRAWKI"
else
cat
fi |
case "$FORMAT" in
(html)
"${0%/*}/../aux/fixmeta" us-ascii |
if [ "$HTMLCHAR" ]
then
"${0%/*}/UTF8-html$HTMLCHAR"
else
trs -e '\}\[@&<>\] @' \
${NPOJED:+-e} ${NPOJED:+"$APPROX"} \
-e "$APPROX1" \
${NPOJED:+-f} ${NPOJED:+"${0%/*}/../aux/UTF8-ascii"} \
-f "${0%/*}/../aux/UTF8-ascii1" \
-e "\300\-\377 ${REPLACE:-?} \200\-\277 \!" |
trs -e '@@ @ @& & @< < @> > & &amp; < &lt; > &gt;'
fi
;;
(tex)
trs -e '\}\[@\#$%&\\^_{|}~\] @' \
-f "${0%/*}/../aux/UTF8-tex" \
-e "$APPROX" \
-e "$APPROX1" \
-f "${0%/*}/../aux/UTF8-ascii" \
-f "${0%/*}/../aux/UTF8-ascii1" \
-e "\300\-\377 ${REPLACE:-?} \200\-\277 \!" |
trs -e '@@ @ @\# \# @$ $ @% % @& & @\\ \\ @^ ^ @_ _ @{ { @| | @} } @~ ~
\# \\\# $ \\$ % \\% & \\& \\ $\\backslash$ ^ \\^{} _ \\_ { \\{ | $|$ }=20
\\} ~ \\~{}'
;;
(*)
trs ${NPOJED:+-e} ${NPOJED:+"$APPROX"} \
-e "$APPROX1" \
${NPOJED:+-f} ${NPOJED:+"${0%/*}/../aux/UTF8-ascii"} \
-f "${0%/*}/../aux/UTF8-ascii1" \
-e "\300\-\377 ${REPLACE:-?} \200\-\277 \!"
;;
esac
-------------------------------------------------------------------------=
-------------

There's bash wizardry in there I can't even begin to fathom.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-24 10:40:27

Hi Markus,

Markus Hoenicka wrote:
> Actually we can still use ISO-8859-1 or whatever as the RIS input
> format as refdbd internally converts it to UTF-8 if the database uses
> this encoding. Forcing UTF-8 for RIS data actually makes only sense if
> people use both bibtex and RIS data.

The advantage of making UTF-8 the default encoding at each step in the 
life cycle is you don't have to remember when to specify UTF-8 encoding 
and when not to.  Still, the most important thing is making sure to 
document clearly what the default encoding is at each step so the user 
knows what is happening.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-24 10:57:02

David Nebauer <dav...@sw...> was heard to say:

> Hi Markus,
>
> Markus Hoenicka wrote:
> > Actually we can still use ISO-8859-1 or whatever as the RIS input
> > format as refdbd internally converts it to UTF-8 if the database uses
> > this encoding. Forcing UTF-8 for RIS data actually makes only sense if
> > people use both bibtex and RIS data.
>
> The advantage of making UTF-8 the default encoding at each step in the
> life cycle is you don't have to remember when to specify UTF-8 encoding
> and when not to.  Still, the most important thing is making sure to
> document clearly what the default encoding is at each step so the user
> knows what is happening.
>

This is approximately what I intended to say, but I guess I was too tired to be
as clear as I should. I plan to set all defaults to UTF-8, but users are still
free to configure their systems differently if they have a good reason.

regards,
Markus


-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

[Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-06 22:42:45

Markus Hoenicka writes:
 > I'll set up a test case and see what happens. I recall it might be
 > necessary to specify a default database anyway (i.e. use the -d switch
 > of runbib) even if you specify a database in each citation. Does the
 > problem persist if you set a default database?
 > 

Upon checking the code I noticed that I had to disable support for
using more than one database for some reason. It seems to be related
to the fact that I wanted to use the citation key as reference without
an ID prefix or something. However, this way I can't safely separate a
database part from the citation key proper. I'll have to figure out
something how I can make this work again.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-07 00:02:34

Hi Markus,

> Upon checking the code I noticed that I had to disable support for
> using more than one database for some reason.
> I'll have to figure out something how I can make this work again.

This would be a *very* nice feature to have.  For what it's worth, 
however, forcing the use of a single database per document would simply 
bring LaTeX support in line with DocBook document support.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-07 07:01:00

David Nebauer <dav...@sw...> was heard to say:

> This would be a *very* nice feature to have.  For what it's worth,
> however, forcing the use of a single database per document would simply
> bring LaTeX support in line with DocBook document support.
>

I didn't try lately, but the reverse might be true. Once upon a time the
DocBook/TEI code also allowed using more than one database. I'll check.

regards,
Markus


-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-07 23:12:47

Markus Hoenicka writes:
 > I didn't try lately, but the reverse might be true. Once upon a time the
 > DocBook/TEI code also allowed using more than one database. I'll check.
 > 

I've checked the situation with DocBook and TEI. Both support database
names in citations using the full format. refdbxp does not support
database names, presumably because the short citation format has no
means to encode a database name.

I've fixed refdbd to support multiple databases in bibtex too.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-08 05:25:36

Hi Markus,

> I've fixed refdbd to support multiple databases in bibtex too.

Now it is not working for me at all.

runbib is retrieving no information.  Here is an annotated transcript:

------------------------------------------------------------------------------------------
 >> Here are the tex-related files <<

$ ls
-rw-r--r-- 1 david david  155 2006-07-08 14:13 test.aux
-rw-r--r-- 1 david david  275 2006-07-08 14:06 test.bbl
-rw-r--r-- 1 david david    1 2006-07-08 14:22 test.bib
-rw-r--r-- 1 david david  914 2006-07-08 14:06 test.blg
-rw-r--r-- 1 david david  696 2006-07-08 14:13 test.dvi
-rw-r--r-- 1 david david 3017 2006-07-08 14:13 test.log
-rw-r--r-- 1 david david  480 2006-07-08 14:06 test.tex
-rw-r--r-- 1 david david  178 2006-07-06 15:36 test.tex~

 >> Here are the references in the tex document <<

$ cat test.tex
%        File: test.tex
%     Created: Thu Jul 06 03:00 PM 2006 C
% Last Change: Thu Jul 06 03:00 PM 2006 C
%
\documentclass[a4paper]{article}
\usepackage{natbib}
\author{David Nebauer}
\title{Test Document}
\begin{document}
\maketitle

\section{Introduction}

This is a test of the RefDB application used in conjunction with 
vim-latexsuite.  Here is a reference \cite{Agnew0}.  Here is another 
\cite{Weckert0}.

\bibliographystyle{plainnat}
\bibliography{test}

\end{document}


 >> Let me prove the references exist <<

$ refdbc -C getref -d refs_computing :CK:=Agnew0
ID*:17 (2000)
Key: Agnew0
Agnew,Grace
Government Access to Encryption Keys
 

999:1 retrieved:0 failed
$ refdbc -C getref -d refs_computing :CK:=Weckert0
ID*:23 (1997)
Key: Weckert0
Weckert,J.
Intellectual Property Rights and Computer Software
Business Ethics: A European Review 6(2):102-109

999:1 retrieved:0 failed

 >> Let me show the test.aux file includes those references <<

$ cat test.aux
\relax
\citation{Agnew0}
\citation{Weckert0}
\bibstyle{plainnat}
\bibdata{test}
\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}}

 >> Here is the runbib command that returns no results <<

$ runbib -d refs_computing -S bibtex-full -t bibtex test
999:0 retrieved:0 failed

 >> The corresponding refdbib command also returns nothing <<

$ refdbib -d refs_computing -S bibtex-full -t bibtex test.aux > test.bib
999:0 retrieved:0 failed

 >> The test.bib file remains empty! <<

$ cat test.bib

$
------------------------------------------------------------------------------------------

I ran refdbd standalone at log setting 7.  Here is the feedback 
generated when running the runbib or refdbib command as above:

------------------------------------------------------------------------------------------
adding client 127.0.0.1 on fd 5
server waiting n_max_fd=5
try to read from client
serving client on fd 5 with protocol version 4
012-58-51-27
send pseudo-random string to client
parent removing client on fd 5
server waiting n_max_fd=4
gettexbib  -u david -w xxxxxxxxxxxxxxxxxxxxxxxxxxx -d refs_computing -s 
bibtex-full 19
dbi is up
localhost
david
daviduser
refs_computing

sqlite
/var/lib/refdb/db

refdb
connected to database server using database:
refdb
Main database looks ok:
refdb
localhost
david
daviduser
refs_computing

sqlite
/var/lib/refdb/db

refs_computing
SELECT meta_app,meta_type,meta_dbversion from t_meta
connected to database server using database:
refs_computing
command processing done, finish dialog now
child finished client on fd 5
child exited with code 0
server waiting n_max_fd=4
------------------------------------------------------------------------------------------

I'm not familiar with the 'gettexbib' command.  The '19' initially 
looked a little strange but I had a quick dive into refdbib.c and it 
looks like that is a legitimate parameter -- the command buffer string 
length.

Adding the database name to each reference as per the manual makes no 
difference.

Regards,
David.

P.S. The refdb-users lists is rejecting my posts sporadically with the 
claim my ISP is not providing a postmaster address, so I'm copying all 
my posts to your personal email address.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 09:28:29

David Nebauer <dav...@sw...> was heard to say:

> This is a test of the RefDB application used in conjunction with
> vim-latexsuite.  Here is a reference \cite{Agnew0}.  Here is another
> \cite{Weckert0}.
>

Before my latest patch, these citations probably did work. I've implemented this
version a while ago to move to a citation syntax familiar to LaTeX users, i.e.
use the citation key in curly brackets. However, this does not allow a safe
distinction of citation keys with and without a database part. In order to
support multiple databases I had to revert this to the original format where
the citation key is prefixed with "ID" or "dbname-ID". The following is
supposed to work:

 This is a test of the RefDB application used in conjunction with
 vim-latexsuite.  Here is a reference \cite{IDAgnew0}.  Here is another
 \cite{otherdb-IDWeckert0}.

I'm open for suggestions if you know a better way to safely distinguish database
names from the citation key proper.

While testing the code I came across a problem with bibliography entries which
contain ampersands. The ampersand seems to be a control character in
LaTeX/bibtex and needs to be escaped in the bibtex output. I'll look into this
shortly.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-08 11:24:53

Hi Markus,

> In order to
> support multiple databases I had to revert this to the original format where
> the citation key is prefixed with "ID" or "dbname-ID".
>
> I'm open for suggestions if you know a better way to safely distinguish database
> names from the citation key proper.
>   

I'm afraid the current scheme is unusable.  It is possible for citation 
keys to contains hyphens -- in fact, the default key for a hyphenated 
author surname contains a hyphen.  Try using a hyphenated citation key 
and watch the fun.  Even better, combine a database name with the 
hyphenated citation key -- that introduces another hyphen.  Even more fun.

refdb allows you to create databases whose names contain hyphens.  I 
haven't tried including a hyphenated database with a hyphenated citation 
key in the one citation -- I'm too scared.  Interestingly, while I can 
create a database with a hyphenated name I can't delete it (at least 
with an sqlite backend) -- the deletedb operation fails.

IIRC, it is illegal to include colons in citation keys.  I seem to 
recall they are automatically stripped out.  It is currently possible to 
include hyphens in database names.  But, if you made it illegal to 
include a hyphen in a database name that gives you a ready-made 
delimiter to use in citations.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 12:14:07

David Nebauer <dav...@sw...> was heard to say:

> I'm afraid the current scheme is unusable.  It is possible for citation
> keys to contains hyphens -- in fact, the default key for a hyphenated
> author surname contains a hyphen.  Try using a hyphenated citation key
> and watch the fun.  Even better, combine a database name with the
> hyphenated citation key -- that introduces another hyphen.  Even more fun.
>

That's why the code does not rely on the hyphen as a separator, but on the
sequences "ID" and "-ID", which are checked for in this particular order from
left to right. Unless  a citation is malformed, you can have as many hyphens or
even "-ID" sequences in your citation keys as you like:

\cite{IDMILLER-IDRUM-2005}
      **
\cite{dbname-IDMILLER-IDRUM-2005}
            ***

The '*' mark the database name prefix separator in both cases. Unless I'm dense
this is foolproof as far as citation keys are concerned. Trouble may arise when
you use database names like "IDBASE" or "DATA-IDBASE". RefDB would have to
reject these names in order to avoid trouble.

> refdb allows you to create databases whose names contain hyphens.  I
> haven't tried including a hyphenated database with a hyphenated citation
> key in the one citation -- I'm too scared.  Interestingly, while I can
> create a database with a hyphenated name I can't delete it (at least
> with an sqlite backend) -- the deletedb operation fails.

I'll have to investigate this. SQLite databases are deleted on the filesystem
level by using an unlink() system call - I can't imagine why that would fail
with a hyphen in the filename.

>
> IIRC, it is illegal to include colons in citation keys.  I seem to
> recall they are automatically stripped out.  It is currently possible to
> include hyphens in database names.  But, if you made it illegal to
> include a hyphen in a database name that gives you a ready-made
> delimiter to use in citations.
>

Yes, colons are not allowed in IDREF attributes (xref linkend). And yes, refdbd
indeed strips out colons in citation keys to avoid creating invalid output.

The remainder of your suggestion is less clear to me. If I understand correctly,
you suggest to use the hyphen under the assumption that database names never
have hyphens (this could indeed be enforced). But how do you distinguish
between a citation key prefixed with a database name and a sole citation key
containing a hyphen? As in:

dbname-citekey
cite-key

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 14:21:51

David Nebauer <dav...@sw...> was heard to say:

>     IIRC, it is illegal to include colons in citation keys. I seem to
>     recall they are automatically stripped out. It is currently possible
>     to include colons in database names. But, if you made it illegal to
>     include a colon in a database name that gives you a ready-made
>     delimiter to use in citations.
>
>
> I think that suggestion makes more sense.
>

Well, if I understand *that* correctly, you suggest to use these forms in LaTeX
documents:

\cite{citekey}
\cite{dbname:citekey}

This is more compact than the "-ID" kludge that I currently use. The only
downside, if at all, is that this citation syntax is different from the one
used in the full style in SGML/XML documents (where, as noted previously, we
must not use a colon). However, this might only confuse those who work with
both LaTeX and SGML/XML. Unless someone else has objections, I'll implement
your suggestion shortly.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 22:23:16

Markus Hoenicka writes:
 > \cite{citekey}
 > \cite{dbname:citekey}
 > 

Just to let y'all know that the current Subversion version supports
the above mentioned citation format in LaTeX documents.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 14:35:03

David Nebauer <dav...@sw...> was heard to say:

>  From one of the many guides on LaTeX:
>
>     You can use any of the standard characters that you find on your
>     keyboard, except the following 10 symbols:
>            { } % & $ # _ ^ ~ \
>     These symbols may only occur in LATEX commands.
>
>
> The first seven of the characters shown above are included as literals
> by escaping them with a backslash.

Thanks for that.

> There's something else.  You may recall some time ago all the trouble
> taken to ensure entities such as &mdash; and &amp; are preserved in
> database reference entries and subsequently then preserved throughout
> DocBook processing.  Many of my references include entities in document
> titles.  Well, those entities are now appearing in the bibtex entries
> created by runbib.  As you noted, the raw ampersands choke LaTeX.  Is
> there any way of converting those xml-safe entities to LaTeX equivalents
> as runbib exports them?  In the case of '&mdash;' that would be '---'.
>

I don't think this is too hard. I'll have to adapt the character replacement
routine which currently converts the input to XML-safe strings to replace the
entities back to something that LaTeX can grok. Most of the work is to compile
a table that defines the required translations.

regards,
Markus


-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 22:21:12

David Nebauer writes:
 > There's something else.  You may recall some time ago all the trouble 
 > taken to ensure entities such as &mdash; and &amp; are preserved in 
 > database reference entries and subsequently then preserved throughout 
 > DocBook processing.  Many of my references include entities in document 
 > titles.  Well, those entities are now appearing in the bibtex entries 
 > created by runbib.  As you noted, the raw ampersands choke LaTeX.  Is 
 > there any way of converting those xml-safe entities to LaTeX equivalents 
 > as runbib exports them?  In the case of '&mdash;' that would be '---'.
 > 

I thought about this a bit more. I'm afraid this is going to get far
more complex than I thought in the first place. We need to:

- replace XML entities that stem from risx documents or which were
  deliberately used in RIS data. E.g. '&mdash;' -> '---'

- backslash-escape LaTeX command characters unless, and that's the
  catch, they are used as LaTeX commands. A LaTeX-only user may
  rightfully expect e.g. author names like 'H\"{a}\{ss}ler' (as
  imported from a bibtex file) to be processed correctly, or
  e.g. '{\bf emphasized}' words in titles. refdbd would have to
  acquire a thorough knowledge of LaTeX commands to cope with this.

- translate foreign letters and letters with diacritics to their TeX
  equivalents from, and that's the catch here, any supported character
  encoding. The same TeX representation of such a letter may be
  encoded as a variety of one to three-byte sequences in different
  uni- or multibyte character sets.

Is anyone aware of a library or a tool that implements these
transformations? I'm only aware of tex2mail which does the reverse.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-10 21:46:40

Hi David,

David Nebauer writes:
 > I've also been thinking further about this.  It seems to me the issu=
e is=20
 > what formats to use when inputting and outputting:
 >=20
 >=20
 >                                  ------------> output for XML
 >  input data                     |
 > ------------->  STORAGE  -------|
 >                 FORMAT          |
 >                                  ------------> output for BibTeX/LaT=
eX
 >=20
 >=20
 > A simple scheme seems to me to consist of the following:
 >  - input and store all data as unicode
 >  - during output perform needed translations
 >=20

This may be a necessary tradeoff. So far, the LaTeX diehards were able
to use many LaTeX constructs in e.g. the author names or the titles
(think of italics, superscripts, or subscripts which are not uncommon
in e.g. physics or chemistry papers). This may be a pain in the neck
to search afterwards, but at least you could do it. If we follow the
simplified scheme you outline above, you can no longer use these LaTeX
hacks. I'd like to hear from the LaTeX users (I'm not one of them
currently) how important it is to include LaTeX markup into the data.=20=

 > The *minimum* necessary translations needed are:
 >=20
 > 1. BibTeX/LaTeX
 >=20
 > Convert the following control characters to appropriate escape seque=
nces:
 >     # $ % & ~ =5F ^ \ { }
 >=20

I'm afraid there's more to it. We have to remove lots of commands like
the above mentioned boldface, italics, superscript, subscript and
such. These commands do not make any sense in the context of
SGML/XML. We also have to translate foreign characters (\"{a}, {\ss} an=
d
similar constructs. Part of this translation can be achieved through
tex2mail, although it does not seem to create UTF-8 (but see below).

 > 2. XML/SGML
 >=20
 > Convert the two illegal characters to their respective entities:
 >     & <
 >=20
 > While not illegal, it is customary also to convert the following cha=
racters:
 >     > ' "

I was under the impression that &amp;, &lt;, and &gt; always have to
be replaced as these are part of the XML markup. Why and to what would
you like to convert ' and "=3F

 >=20
 >=20
 > The question then arises as to whether any other translation is=20
 > necessary and/or desirable.  In theory no other translation is=20
 > necessary.  LaTeX can process "raw" unicode using the 'ucs' package.=
 =20

This is good news. My only LaTeX book dates back to 1999, and Unicode
does not seem to be mentioned. The transformations would be so much
simpler if we didn't have to create LaTeX commands to represent foreign=

or special characters.

 > The XML standard states, "Legal characters are tab, carriage return,=
=20
 > line feed, and the legal graphic characters of Unicode and ISO/IEC 1=
0646".
 >=20

This is pretty much what RefDB currently outputs.

 > Having said that, it may be desirable to translate non-ascii charact=
ers=20
 > into decimal numeric character references (e.g., '&#226;') for XML o=
r,=20
 > for LaTeX, appropriate escape sequences.  Perhaps this could be opti=
onal=3F
 >=20

I think it is common to leave the non-ascii characters in the xml file
and use the proper charset declaration (UTF-8 by default). IMHO
character entities do not have any advantage over UTF-8. I'm not sure
about LaTeX output. How hard is it to make the use of the ucs package
mandatory for RefDB users=3F Once it is installed, it is as simple as
inserting one line at the top of your document, isn't it=3F

 > One interesting consequence of this is that author names may contain=
=20
 > non-ascii characters.  If, when new references are added to refdb, t=
here=20
 > is no citation key specified, the citekey is constructed by mangling=
=20
 > primary author surname and year.  If citekey is restricted to ascii=20=

 > characters then non-ascii author surname characters would have to be=
=20
 > stripped or converted (e.g., =E4 -> a, =DF -> ss).
 >=20

Currently non-ascii characters are simply stripped. You always have
the option to specify a citation key explicitly when adding a
reference, using any reasonable translation of the foreign characters
to ascii.

 >     escapechars    converts non-ASCII (UTF-8, Latin-1 etc.) files to=
=20
 > ASCII with XML or TeX
 >                    escape sequences
 >     latex2utf8txt  converts LaTeX files to UTF-8 text, removes line=20=

 > breaks from paragraphs

Thanks for the pointers. I've downloaded these scripts and will give
them a try. If the latter works as advertized, it could be used as a
post-processing filter after bib2ris (or, if I'll ever end up having
too much time on my hands, I could reimplement bib2ris in Perl and
integrate the conversion code). The former is a bit trickier as the
conversion should run in refdbd. However, the script looks simple
enough that I might be able to recode the algorithm in C.

regards,
Markus

--=20
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-11 15:54:02

Hi David,

David Nebauer <dav...@sw...> was heard to say:

> If you instead go with my idea to store as unicode you don't need to
> know anything about the eventual output format when you store the
> reference.  Indeed, the user doesn't have to know at that time.  The
> same references can be used for either DocBook or LaTeX.  You can easily
> add in other output formats later and all you have to do is write
> another output filter.
>

I'm all with you here. I just wanted to get opinions from real-world LaTeX users
whether or not it makes sense to preserve the markup.

> Your point is true but I say it is a small loss.  Using LaTeX formatting
> codes means your references can never be used for any other format
> without hacking in some kind of conversion.  RefDB is designed to be a
> long-term reference database enabling the contained references to be
> used all kinds of interesting ways.  Use of format-specific markup
> limits your future choices.  As a minor example it prevents their use in
> DocBook documents.

True, but I assumed that only those might want to keep the markup who use RefDB
solely for LaTeX.

>
> Another issue is the ability of library and indexing systems to handle
> such formatting complexities as superscripting, subscripting and font
> changes.  You know far more about such things than I, but I would guess
> even the most complex article title is reduced to canonical ascii for
> storage in many cataloguing systems.  I presume the algorithms for such
> simplification are fairly predictable.  Anyone searching for the journal
> article by title would be easily able to predict the stored character
> sequence.  I would endeavour to suggest the simplified form of title
> would be entirely acceptable in any kind of bibliography.
>
> In any event, how would such a complex title be stored in plain ascii?
> Or Unicode?  Or even XML (imagine the attempt to use MathML in a title
> string!)?
>

The database which I use mostly (www.pubmed.org) indeed "ascii-izes" the titles.
The tagged format uses plain ASCII with a pretty crude transliteration, whereas
the XML format uses Unicode.

> As mentioned above, I am unconvinced about the utility of keeping
> boldface, italics, superscript and subscript-type markup.  As for
> foreign characters, almost any foreign character can be represented in

I'm afraid I didn't express my thoughts very well here. What I was talking about
is that a reference imported from bibtex may contain markup like

"Title with an {\bf emphasized} word"

It is not sufficient to escape characters but we have to remove the "{\bf " and
the "}" sequences before we import the reference. This is what one of the
scripts that you pointed me to as well as tex2mail do.

>     To allow attribute values to contain both single and double quotes,
>     the apostrophe or single-quote character (') may be represented as
>     "&apos;", and the double-quote character (") as "&quot;".
>
>
> The relevant portion states, "The right angle bracket (>) *may* be
> represented using the string '&gt;'," but "*must*, for compatibility, be
> escaped using '&gt;' or a character reference when it appears in the
> string ']]>'." (emphases mine)
>
> The last paragraph in the quote refers to straight single and double
> quotation mark entities.
>

But it appears to talk about attribute values. XML output from RefDB never puts
quotes into attribute values, so we're left with &,<,>.

> It worked for me "out of the box".  I installed the 'ucs' package
> (apt-get install latex-ucs), added those two lines to the preamble, ran
> 'latex test' and, presto, gloriously rendered unicode.
>

This is great news indeed. I will have to mention this in the manual

I take from this discussion:

1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such
code) which strips markup like boldface, superscript etc. and translates
foreign characters entered as LaTeX constructs to their Unicode equivalents.

2) Modify the code to prevent XML entities to show up in LaTeX output.

3) Add code to escape the LaTeX command characters in the LaTeX output.

The second point is a bit tricky. References imported from RIS usually do not
contain entities, but references imported from risx are likely to do. Either I
convert these entities during import, or I remove them during LaTeX export. The
former seems cleaner to me, and I think this is what you had in mind.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-11 16:37:20

Hi Markus,

> I take from this discussion:
>
> 1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such
> code) which strips markup like boldface, superscript etc. and translates
> foreign characters entered as LaTeX constructs to their Unicode equivalents.
>
> 2) Modify the code to prevent XML entities to show up in LaTeX output.
>
> 3) Add code to escape the LaTeX command characters in the LaTeX output.
>
> The second point is a bit tricky. References imported from RIS usually do not
> contain entities, but references imported from risx are likely to do. Either I
> convert these entities during import, or I remove them during LaTeX export. The
> former seems cleaner to me, and I think this is what you had in mind.

Yes, in my view the storage format is Unicode without markup:


BibTeX -------               ---------> DocBook
             |               |
             |               |
RIS ---------+--> STORAGE ----
             |   (Unicode)   |
             |               |
RISX ---------               ---------> LaTeX



Whatever the input format, all references end up in the same storage 
format (Unicode sans markup).  This would require stripping out XML 
entities and LaTeX markup.  With luck you can use existing tools to do 
this.  The stored references can then be output in either DocBook- or 
LaTeX-compatible format.  This seems to be to be an elegant way of 
dealing with the mishmash of input and output formats.

Regards,
David.

1 2 > >> (Page 1 of 2)