refdb-devel Mailing List for RefDB (Page 16)

Status: Beta

Brought to you by: mhoenicka

refdb-devel — List for RefDB developers

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (14)	Dec
2002	Jan	Feb	Mar	Apr (8)	May (1)	Jun (1)	Jul (1)	Aug (2)	Sep (1)	Oct	Nov	Dec (1)
2003	Jan	Feb (1)	Mar (5)	Apr (6)	May (6)	Jun (4)	Jul (11)	Aug	Sep (3)	Oct	Nov	Dec (174)
2004	Jan (10)	Feb (2)	Mar	Apr	May (2)	Jun (2)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan (2)	Feb (6)	Mar (11)	Apr	May	Jun	Jul (2)	Aug	Sep (25)	Oct (18)	Nov (16)	Dec (19)
2006	Jan (6)	Feb	Mar	Apr (21)	May (9)	Jun (5)	Jul (51)	Aug (89)	Sep (42)	Oct (19)	Nov (47)	Dec (4)
2007	Jan (8)	Feb (1)	Mar	Apr (1)	May	Jun	Jul (4)	Aug (4)	Sep (5)	Oct	Nov (7)	Dec (4)
2008	Jan	Feb	Mar	Apr (14)	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2009	Jan	Feb (21)	Mar (8)	Apr (5)	May (6)	Jun (2)	Jul (5)	Aug	Sep (3)	Oct (14)	Nov	Dec
2010	Jan (18)	Feb (5)	Mar	Apr	May (4)	Jun (3)	Jul	Aug	Sep	Oct	Nov	Dec
2011	Jan	Feb	Mar (2)	Apr	May	Jun	Jul	Aug	Sep (4)	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (9)	Nov	Dec

Flat | Threaded

<< < 1 .. 14 15 16 17 18 .. 33 > >> (Page 16 of 33)

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 21:07:38

Hi David,

David Nebauer writes:
 >  From the 'ucs' package documentation:
 > 
 >     The simplest use of this package is to add
 >         \usepackage{ucs}
 >         \usepackage[utf8x]{inputenc}
 >     to your header. You may even omit the first line in many cases.
 > 
 > 
 > It worked for me "out of the box".  I installed the 'ucs' package 
 > (apt-get install latex-ucs), added those two lines to the preamble, ran 
 > 'latex test' and, presto, gloriously rendered unicode.
 > 

There seems to be something else for Unicode support. Ever heard of
this? Is that supposed to work without installing additional files?

(cited from:
http://mail.nl.linux.org/linux-utf8/2004-04/msg00000.html)

------
In mid February, the LaTeX project team released a new version that now
supports UTF-8. For details, see

  ftp://ftp.tex.ac.uk/tex-archive/macros/latex/base/utf8ienc.dtx

You now can finally simply replace

  \usepackage[latin1]{inputenc}

with

  \usepackage[utf8]{inputenc}

and can this way move all your non-ASCII LaTeX input to UTF-8 as well.


-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 14:22:59

Hi Markus,

> It is. Unfortuntately the konwert sources are a bit hard on the eyes be=
cause
> they're in Polish but I'll try to steal from that anyway.

A lot of the heavy lifting is done by the (executable) filters. On my=20
system they live in '/usr/share/konwert/filters/'.

UTF8-ascii is a bash script:
-------------------------------------------------------------------------=
-------------
#!/bin/bash -

VARIANT_bg=3D'
=D0=A9 SHT
=D1=89 sht
' VARIANT_de=3D'
=C3=84 AE
=C3=96 OE
=C3=9C UE
=C3=A4 ae
=C3=B6 oe
=C3=BC ue
' VARIANT_hr=3D'
=C4=90 DJ
=C4=91 dj
' VARIANT_vi=3D'
=C3=80 A`
=C3=81 A'\''
=C3=82 A^
=C3=83 A~
=C3=88 E`
=C3=89 E'\''
=C3=8A E^
=C3=8C I`
=C3=8D I'\''
=C3=92 O`
=C3=93 O'\''
=C3=94 O^
=C3=95 O~
=C3=99 U`
=C3=9A U'\''
=C3=9D Y'\''
=C3=A0 a`
=C3=A1 a'\''
=C3=A2 a^
=C3=A3 a~
=C3=A8 e`
=C3=A9 e'\''
=C3=AA e^
=C3=AC i`
=C3=AD i'\''
=C3=B2 o`
=C3=B3 o'\''
=C3=B4 o^
=C3=B5 o~
=C3=B9 u`
=C3=BA u'\''
=C3=BD y'\''
=C4=82 A(
=C4=83 a(
=C4=90 DD
=C4=91 dd
=C4=A8 I~
=C4=A9 i~
=C5=A8 U~
=C5=A9 u~
' VARIANT1_bg=3D'
=D0=AA Y
=D1=8A y
' VARIANT1_ua=3D'
=D0=98 Y
=D0=B8 y
' REPLACE=3D'?' MIME=3Dus-ascii

if [ "$FILTERM" =3D out ]
then
NPOJED=3D
else
NPOJED=3D1
fi
FORMAT=3D
HTMLCHAR=3D
POPRAWKI=3D
for A in $ARG
do
case "$A" in
(1) NPOJED=3D;;
(html) FORMAT=3Dhtml;;
(htmldec|htmlhex) FORMAT=3Dhtml; HTMLCHAR=3D${A#html};;
(tex) FORMAT=3Dtex;;
(*)
if [ -x "${0%/*}/../aux/argcharset/$A" ]
then
POPRAWKI=3D${POPRAWKI:+$POPRAWKI | }${0%/*}/../aux/argcharset/$A
fi
VARIANT=3DVARIANT_$A; APPROX=3D"${!VARIANT} $APPROX"
VARIANT=3DVARIANT1_$A; APPROX1=3D"${!VARIANT} $APPROX1"
;;
esac
done

if [ "$POPRAWKI" ]
then
"$SHELL" -c "$POPRAWKI"
else
cat
fi |
case "$FORMAT" in
(html)
"${0%/*}/../aux/fixmeta" us-ascii |
if [ "$HTMLCHAR" ]
then
"${0%/*}/UTF8-html$HTMLCHAR"
else
trs -e '\}\[@&<>\] @' \
${NPOJED:+-e} ${NPOJED:+"$APPROX"} \
-e "$APPROX1" \
${NPOJED:+-f} ${NPOJED:+"${0%/*}/../aux/UTF8-ascii"} \
-f "${0%/*}/../aux/UTF8-ascii1" \
-e "\300\-\377 ${REPLACE:-?} \200\-\277 \!" |
trs -e '@@ @ @& & @< < @> > & &amp; < &lt; > &gt;'
fi
;;
(tex)
trs -e '\}\[@\#$%&\\^_{|}~\] @' \
-f "${0%/*}/../aux/UTF8-tex" \
-e "$APPROX" \
-e "$APPROX1" \
-f "${0%/*}/../aux/UTF8-ascii" \
-f "${0%/*}/../aux/UTF8-ascii1" \
-e "\300\-\377 ${REPLACE:-?} \200\-\277 \!" |
trs -e '@@ @ @\# \# @$ $ @% % @& & @\\ \\ @^ ^ @_ _ @{ { @| | @} } @~ ~
\# \\\# $ \\$ % \\% & \\& \\ $\\backslash$ ^ \\^{} _ \\_ { \\{ | $|$ }=20
\\} ~ \\~{}'
;;
(*)
trs ${NPOJED:+-e} ${NPOJED:+"$APPROX"} \
-e "$APPROX1" \
${NPOJED:+-f} ${NPOJED:+"${0%/*}/../aux/UTF8-ascii"} \
-f "${0%/*}/../aux/UTF8-ascii1" \
-e "\300\-\377 ${REPLACE:-?} \200\-\277 \!"
;;
esac
-------------------------------------------------------------------------=
-------------

There's bash wizardry in there I can't even begin to fathom.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 14:17:46

Hi Markus,

>> I'm experiencing all kinds of difficulty using the latest svn refdb
>> build with LaTeX/BibTeX.  'runbib' will not extract records in BibTeX
>> format unless citations are in the previous '\cite{[dbname-]IDcitekey}'
>> format.
>>     
> I hardly dare to ask, but are you sure you installed the svn version and
> restarted refdbd?

I install from custom deb packages so refdbd is stopped and started as 
part of the debian package upgrade process.  My source tree is at 
revision 81.

I checked everything, including the source code changes you made at 
version 72 to alter the citation format.  Finally I remembered some 
advice you gave recently about checking for multiple running instances 
of refdbd.  Sure enough, I had an extra instance running, probably from 
a debugging exercise where I was running refdbd in standalone mode.  
Once stopped the old behaviour went away.

Problem solved (he says sheepishly).

FWIW, I can confirm the new citation format is working correctly.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-07-12 13:10:26

> RefDB uses bibliography styles even for generating LaTeX bibliographies. 

Oh. Right!

> should be able to generate at least lowercased (except the first char),

I don't think I'd be able to use that because presumably it would 
lowercase proper nouns, acronyms and subtitle headings.

> all-caps, and mixed case output.

The latter is fine except, as before, when your article title 
capitalises verbs and improper nouns. I'm happy to do it this way 
though, I'll just make sure no capitalised verbs and improper nouns get 
into my database and hope not to encounter citation styles that require 
them.

Cheers
Damien

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 12:30:12

Damien Jade Duff <D.J...@cs...> was heard to say:

> trying to second guess bibtex, just set the whole title as verbatim - e.g.
>
>     TITLE = {{Why AM and EURISKO appear to work}},
>
> Folks who use latex would probably have to turn non-proper nouns that
> are capitalised into lowercase before putting them in to RefDB and hope
> to seldom encounter a citation style that requires these things to be
> reproduced verbatim from the original article (I don't think I have yet).
>

RefDB uses bibliography styles even for generating LaTeX bibliographies. These
styles currently only take care of the capitalization of titles. IIRC you get
the best results if you add your data using mixed-case and then pick the proper
style for output. If we combine that with the curly brackets as shown above, you
should be able to generate at least lowercased (except the first char),
all-caps, and mixed case output.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 12:20:19

Hi David,

David Nebauer <dav...@sw...> was heard to say:


> Use of this (or a similar) tool would result in the much more
> satisfactory, and easy to remember,  default citekey of 'Hassler2006'.
> It should be a fairly simple to add this additional conversion step.
>

It is. Unfortuntately the konwert sources are a bit hard on the eyes because
they're in Polish but I'll try to steal from that anyway.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 12:13:26

Hi David,

David Nebauer <dav...@sw...> was heard to say:

> I'm experiencing all kinds of difficulty using the latest svn refdb
> build with LaTeX/BibTeX.  'runbib' will not extract records in BibTeX
> format unless citations are in the previous '\cite{[dbname-]IDcitekey}'
> format.  Using the new citation format on my system results in '999:0
> retrieved:0 failed'.  Would you mind checking on your system?  If the
> new format works for you it must be a problem at my end and I'll work up
> a test case.

I hardly dare to ask, but are you sure you installed the svn version and
restarted refdbd? I just checked the svn code, and all changes are in place.
The current svn code certainly does not look for "-ID" but for ":" as a
database separator. I'm sure that the new format works on my FreeBSD box.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-07-12 11:28:02

Markus Hoenicka wrote:
> David Nebauer writes:
>  > >   TITLE = {Why {AM} and {EURISKO} appear to work},
>  > 
>  > Alternately, you could save the title in plain Unicode with that 
>  > capitalisation and refdb's BibTeX output filter would wrap any 
>  > abnormally capitalised words in braces.  That way your reference can be 
>  > used in either DocBook XML/SGML or LaTeX output.
>  > 
> 
> If it is a matter of wrapping uppercased words in curly brackets, this
> certainly doable. Is it likely to have uppercased words in bibtex data
> which are *not* supposed to be rendered in all-caps?
> 
> regards,
> Markus
> 

Gidday

I imagine only where the style demands it (e.g. APA). Some journals have 
titles with pretty much everything (both proper and non-proper nouns and 
subitles and acronyms) capitalised. On the other hand, APA requires 
nouns that aren't proper to go lower case when citing - proper nouns and 
subitles can be uppercase.

I don't know how this is managed in Docbook styles or Endnote etc, but 
presumably titles are used verbatim because I can't imagine any logic 
funky enough to figure out whether a word is a proper noun or not 
without some extra markup.

If we're going to export all capitals to bibtex verbatim sans markup 
then I think we might as well, as a first approximation, rather than 
trying to second guess bibtex, just set the whole title as verbatim - e.g.

    TITLE = {{Why AM and EURISKO appear to work}},

Folks who use latex would probably have to turn non-proper nouns that 
are capitalised into lowercase before putting them in to RefDB and hope 
to seldom encounter a citation style that requires these things to be 
reproduced verbatim from the original article (I don't think I have yet).

The same may possibly apply to BOOKTITLE, JOURNAL, SERIES, SCHOOL, 
PUBLISHER, INSTITUTION etc.

My 2nd pennies worth.

Regards
Damien

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 09:43:15

Hi Markus,

>  > One interesting consequence of this is that author names may contain=
=20
>  > non-ascii characters.  If, when new references are added to refdb, t=
here=20
>  > is no citation key specified, the citekey is constructed by mangling=
=20
>  > primary author surname and year.  If citekey is restricted to ascii =

>  > characters then non-ascii author surname characters would have to be=
=20
>  > stripped or converted (e.g., =E4 -> a, =DF -> ss).
>
> Currently non-ascii characters are simply stripped. You always have
> the option to specify a citation key explicitly when adding a
> reference, using any reasonable translation of the foreign characters
> to ascii.
>  =20

I'd like to focus on this point again.  I personally allow refdb to=20
generate the citekey for me, mainly because it will automatically append =

'a', 'b', etc. if there is danger of duplication.  Automatically=20
stripping non-ascii characters from authors with foreign characters will =

lead to some unusual results.  A recent publication from our old=20
workhorse 'H=E4=DFler' might produce the citekey 'Hler2006'.

There are tools around which attempt to convert sensibly from unicode to =

ascii.  Here is an example using the tool 'konwert':
-------------------------------------------------------------------------=
--------------
$ cat name
H=E4=DFler, G=FCnter
$ cat name | konwert UTF8-ascii
Hassler, Gunter
$
-------------------------------------------------------------------------=
--------------

Use of this (or a similar) tool would result in the much more=20
satisfactory, and easy to remember,  default citekey of 'Hassler2006'. =20
It should be a fairly simple to add this additional conversion step.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-12 07:56:52

Hi Markus,

> Markus Hoenicka writes:
>  > \cite{citekey}
>  > \cite{dbname:citekey}
>
> Just to let y'all know that the current Subversion version supports
> the above mentioned citation format in LaTeX documents.
>   

I'm experiencing all kinds of difficulty using the latest svn refdb 
build with LaTeX/BibTeX.  'runbib' will not extract records in BibTeX 
format unless citations are in the previous '\cite{[dbname-]IDcitekey}' 
format.  Using the new citation format on my system results in '999:0 
retrieved:0 failed'.  Would you mind checking on your system?  If the 
new format works for you it must be a problem at my end and I'll work up 
a test case.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-11 21:26:08

David Nebauer writes:
 > >   TITLE = {Why {AM} and {EURISKO} appear to work},
 > 
 > Alternately, you could save the title in plain Unicode with that 
 > capitalisation and refdb's BibTeX output filter would wrap any 
 > abnormally capitalised words in braces.  That way your reference can be 
 > used in either DocBook XML/SGML or LaTeX output.
 > 

If it is a matter of wrapping uppercased words in curly brackets, this
certainly doable. Is it likely to have uppercased words in bibtex data
which are *not* supposed to be rendered in all-caps?

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-11 19:51:35

Hi Damien,

Damien Jade Duff wrote:
> I'd like to see the following kind of latex markup reproduced (not 
> stripped entirely):
>   TITLE = {Why {AM} and {EURISKO} appear to work},

Alternately, you could save the title in plain Unicode with that 
capitalisation and refdb's BibTeX output filter would wrap any 
abnormally capitalised words in braces.  That way your reference can be 
used in either DocBook XML/SGML or LaTeX output.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-07-11 17:57:38

Gidday

As a novice latex user it seems okay; I can't envisage any difficulties.

Probably jumping the gun a bit here, but I'd like to see the following 
kind of latex markup reproduced (not stripped entirely):

BIBTEX:
   TITLE = {Why {AM} and {EURISKO} appear to work},

Current RIS:
TI  - Why {AM} and {EURISKO} appear to work

Current RISX:
       <title type="full">Why {AM} and {EURISKO} appear to work</title>

The reason is that bibtex will capitalise your bibliography according to 
the current bibliography style (generally with a single leading 
capital), and you need to inform bibtex that certain passages are to be 
formatted verbatim.

Other than that, I can't envisage the need for keeping latex markup - 
I've been manually stripping everything else. I enter most of my 
references using risx (I use bibtex and RIS where provided but usually 
edit them manually before submitting them to RefDB), and I don't appear 
to have any entites coming back (probably because I don't know how to 
use entities).
Except I think entities are used in some of the URLs - I don't know if 
they're necessary or not, they're just there.

Peace
Damien

David Nebauer wrote:
> Hi Markus,
> 
> 
>>I take from this discussion:
>>
>>1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such
>>code) which strips markup like boldface, superscript etc. and translates
>>foreign characters entered as LaTeX constructs to their Unicode equivalents.
>>
>>2) Modify the code to prevent XML entities to show up in LaTeX output.
>>
>>3) Add code to escape the LaTeX command characters in the LaTeX output.
>>
>>The second point is a bit tricky. References imported from RIS usually do not
>>contain entities, but references imported from risx are likely to do. Either I
>>convert these entities during import, or I remove them during LaTeX export. The
>>former seems cleaner to me, and I think this is what you had in mind.
> 
> 
> Yes, in my view the storage format is Unicode without markup:
> 
> 
> BibTeX -------               ---------> DocBook
>              |               |
>              |               |
> RIS ---------+--> STORAGE ----
>              |   (Unicode)   |
>              |               |
> RISX ---------               ---------> LaTeX
> 
> 
> 
> Whatever the input format, all references end up in the same storage 
> format (Unicode sans markup).  This would require stripping out XML 
> entities and LaTeX markup.  With luck you can use existing tools to do 
> this.  The stored references can then be output in either DocBook- or 
> LaTeX-compatible format.  This seems to be to be an elegant way of 
> dealing with the mishmash of input and output formats.
> 
> Regards,
> David.
> 
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Refdb-devel mailing list
> Ref...@li...
> https://lists.sourceforge.net/lists/listinfo/refdb-devel

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-11 16:37:20

Hi Markus,

> I take from this discussion:
>
> 1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such
> code) which strips markup like boldface, superscript etc. and translates
> foreign characters entered as LaTeX constructs to their Unicode equivalents.
>
> 2) Modify the code to prevent XML entities to show up in LaTeX output.
>
> 3) Add code to escape the LaTeX command characters in the LaTeX output.
>
> The second point is a bit tricky. References imported from RIS usually do not
> contain entities, but references imported from risx are likely to do. Either I
> convert these entities during import, or I remove them during LaTeX export. The
> former seems cleaner to me, and I think this is what you had in mind.

Yes, in my view the storage format is Unicode without markup:


BibTeX -------               ---------> DocBook
             |               |
             |               |
RIS ---------+--> STORAGE ----
             |   (Unicode)   |
             |               |
RISX ---------               ---------> LaTeX



Whatever the input format, all references end up in the same storage 
format (Unicode sans markup).  This would require stripping out XML 
entities and LaTeX markup.  With luck you can use existing tools to do 
this.  The stored references can then be output in either DocBook- or 
LaTeX-compatible format.  This seems to be to be an elegant way of 
dealing with the mishmash of input and output formats.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-11 15:54:02

Hi David,

David Nebauer <dav...@sw...> was heard to say:

> If you instead go with my idea to store as unicode you don't need to
> know anything about the eventual output format when you store the
> reference.  Indeed, the user doesn't have to know at that time.  The
> same references can be used for either DocBook or LaTeX.  You can easily
> add in other output formats later and all you have to do is write
> another output filter.
>

I'm all with you here. I just wanted to get opinions from real-world LaTeX users
whether or not it makes sense to preserve the markup.

> Your point is true but I say it is a small loss.  Using LaTeX formatting
> codes means your references can never be used for any other format
> without hacking in some kind of conversion.  RefDB is designed to be a
> long-term reference database enabling the contained references to be
> used all kinds of interesting ways.  Use of format-specific markup
> limits your future choices.  As a minor example it prevents their use in
> DocBook documents.

True, but I assumed that only those might want to keep the markup who use RefDB
solely for LaTeX.

>
> Another issue is the ability of library and indexing systems to handle
> such formatting complexities as superscripting, subscripting and font
> changes.  You know far more about such things than I, but I would guess
> even the most complex article title is reduced to canonical ascii for
> storage in many cataloguing systems.  I presume the algorithms for such
> simplification are fairly predictable.  Anyone searching for the journal
> article by title would be easily able to predict the stored character
> sequence.  I would endeavour to suggest the simplified form of title
> would be entirely acceptable in any kind of bibliography.
>
> In any event, how would such a complex title be stored in plain ascii?
> Or Unicode?  Or even XML (imagine the attempt to use MathML in a title
> string!)?
>

The database which I use mostly (www.pubmed.org) indeed "ascii-izes" the titles.
The tagged format uses plain ASCII with a pretty crude transliteration, whereas
the XML format uses Unicode.

> As mentioned above, I am unconvinced about the utility of keeping
> boldface, italics, superscript and subscript-type markup.  As for
> foreign characters, almost any foreign character can be represented in

I'm afraid I didn't express my thoughts very well here. What I was talking about
is that a reference imported from bibtex may contain markup like

"Title with an {\bf emphasized} word"

It is not sufficient to escape characters but we have to remove the "{\bf " and
the "}" sequences before we import the reference. This is what one of the
scripts that you pointed me to as well as tex2mail do.

>     To allow attribute values to contain both single and double quotes,
>     the apostrophe or single-quote character (') may be represented as
>     "&apos;", and the double-quote character (") as "&quot;".
>
>
> The relevant portion states, "The right angle bracket (>) *may* be
> represented using the string '&gt;'," but "*must*, for compatibility, be
> escaped using '&gt;' or a character reference when it appears in the
> string ']]>'." (emphases mine)
>
> The last paragraph in the quote refers to straight single and double
> quotation mark entities.
>

But it appears to talk about attribute values. XML output from RefDB never puts
quotes into attribute values, so we're left with &,<,>.

> It worked for me "out of the box".  I installed the 'ucs' package
> (apt-get install latex-ucs), added those two lines to the preamble, ran
> 'latex test' and, presto, gloriously rendered unicode.
>

This is great news indeed. I will have to mention this in the manual

I take from this discussion:

1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such
code) which strips markup like boldface, superscript etc. and translates
foreign characters entered as LaTeX constructs to their Unicode equivalents.

2) Modify the code to prevent XML entities to show up in LaTeX output.

3) Add code to escape the LaTeX command characters in the LaTeX output.

The second point is a bit tricky. References imported from RIS usually do not
contain entities, but references imported from risx are likely to do. Either I
convert these entities during import, or I remove them during LaTeX export. The
former seems cleaner to me, and I think this is what you had in mind.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-10 21:46:40

Hi David,

David Nebauer writes:
 > I've also been thinking further about this.  It seems to me the issu=
e is=20
 > what formats to use when inputting and outputting:
 >=20
 >=20
 >                                  ------------> output for XML
 >  input data                     |
 > ------------->  STORAGE  -------|
 >                 FORMAT          |
 >                                  ------------> output for BibTeX/LaT=
eX
 >=20
 >=20
 > A simple scheme seems to me to consist of the following:
 >  - input and store all data as unicode
 >  - during output perform needed translations
 >=20

This may be a necessary tradeoff. So far, the LaTeX diehards were able
to use many LaTeX constructs in e.g. the author names or the titles
(think of italics, superscripts, or subscripts which are not uncommon
in e.g. physics or chemistry papers). This may be a pain in the neck
to search afterwards, but at least you could do it. If we follow the
simplified scheme you outline above, you can no longer use these LaTeX
hacks. I'd like to hear from the LaTeX users (I'm not one of them
currently) how important it is to include LaTeX markup into the data.=20=

 > The *minimum* necessary translations needed are:
 >=20
 > 1. BibTeX/LaTeX
 >=20
 > Convert the following control characters to appropriate escape seque=
nces:
 >     # $ % & ~ =5F ^ \ { }
 >=20

I'm afraid there's more to it. We have to remove lots of commands like
the above mentioned boldface, italics, superscript, subscript and
such. These commands do not make any sense in the context of
SGML/XML. We also have to translate foreign characters (\"{a}, {\ss} an=
d
similar constructs. Part of this translation can be achieved through
tex2mail, although it does not seem to create UTF-8 (but see below).

 > 2. XML/SGML
 >=20
 > Convert the two illegal characters to their respective entities:
 >     & <
 >=20
 > While not illegal, it is customary also to convert the following cha=
racters:
 >     > ' "

I was under the impression that &amp;, &lt;, and &gt; always have to
be replaced as these are part of the XML markup. Why and to what would
you like to convert ' and "=3F

 >=20
 >=20
 > The question then arises as to whether any other translation is=20
 > necessary and/or desirable.  In theory no other translation is=20
 > necessary.  LaTeX can process "raw" unicode using the 'ucs' package.=
 =20

This is good news. My only LaTeX book dates back to 1999, and Unicode
does not seem to be mentioned. The transformations would be so much
simpler if we didn't have to create LaTeX commands to represent foreign=

or special characters.

 > The XML standard states, "Legal characters are tab, carriage return,=
=20
 > line feed, and the legal graphic characters of Unicode and ISO/IEC 1=
0646".
 >=20

This is pretty much what RefDB currently outputs.

 > Having said that, it may be desirable to translate non-ascii charact=
ers=20
 > into decimal numeric character references (e.g., '&#226;') for XML o=
r,=20
 > for LaTeX, appropriate escape sequences.  Perhaps this could be opti=
onal=3F
 >=20

I think it is common to leave the non-ascii characters in the xml file
and use the proper charset declaration (UTF-8 by default). IMHO
character entities do not have any advantage over UTF-8. I'm not sure
about LaTeX output. How hard is it to make the use of the ucs package
mandatory for RefDB users=3F Once it is installed, it is as simple as
inserting one line at the top of your document, isn't it=3F

 > One interesting consequence of this is that author names may contain=
=20
 > non-ascii characters.  If, when new references are added to refdb, t=
here=20
 > is no citation key specified, the citekey is constructed by mangling=
=20
 > primary author surname and year.  If citekey is restricted to ascii=20=

 > characters then non-ascii author surname characters would have to be=
=20
 > stripped or converted (e.g., =E4 -> a, =DF -> ss).
 >=20

Currently non-ascii characters are simply stripped. You always have
the option to specify a citation key explicitly when adding a
reference, using any reasonable translation of the foreign characters
to ascii.

 >     escapechars    converts non-ASCII (UTF-8, Latin-1 etc.) files to=
=20
 > ASCII with XML or TeX
 >                    escape sequences
 >     latex2utf8txt  converts LaTeX files to UTF-8 text, removes line=20=

 > breaks from paragraphs

Thanks for the pointers. I've downloaded these scripts and will give
them a try. If the latter works as advertized, it could be used as a
post-processing filter after bib2ris (or, if I'll ever end up having
too much time on my hands, I could reimplement bib2ris in Perl and
integrate the conversion code). The former is a bit trickier as the
conversion should run in refdbd. However, the script looks simple
enough that I might be able to recode the algorithm in C.

regards,
Markus

--=20
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 22:23:16

Markus Hoenicka writes:
 > \cite{citekey}
 > \cite{dbname:citekey}
 > 

Just to let y'all know that the current Subversion version supports
the above mentioned citation format in LaTeX documents.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 22:21:12

David Nebauer writes:
 > There's something else.  You may recall some time ago all the trouble 
 > taken to ensure entities such as &mdash; and &amp; are preserved in 
 > database reference entries and subsequently then preserved throughout 
 > DocBook processing.  Many of my references include entities in document 
 > titles.  Well, those entities are now appearing in the bibtex entries 
 > created by runbib.  As you noted, the raw ampersands choke LaTeX.  Is 
 > there any way of converting those xml-safe entities to LaTeX equivalents 
 > as runbib exports them?  In the case of '&mdash;' that would be '---'.
 > 

I thought about this a bit more. I'm afraid this is going to get far
more complex than I thought in the first place. We need to:

- replace XML entities that stem from risx documents or which were
  deliberately used in RIS data. E.g. '&mdash;' -> '---'

- backslash-escape LaTeX command characters unless, and that's the
  catch, they are used as LaTeX commands. A LaTeX-only user may
  rightfully expect e.g. author names like 'H\"{a}\{ss}ler' (as
  imported from a bibtex file) to be processed correctly, or
  e.g. '{\bf emphasized}' words in titles. refdbd would have to
  acquire a thorough knowledge of LaTeX commands to cope with this.

- translate foreign letters and letters with diacritics to their TeX
  equivalents from, and that's the catch here, any supported character
  encoding. The same TeX representation of such a letter may be
  encoded as a variety of one to three-byte sequences in different
  uni- or multibyte character sets.

Is anyone aware of a library or a tool that implements these
transformations? I'm only aware of tex2mail which does the reverse.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 14:35:03

David Nebauer <dav...@sw...> was heard to say:

>  From one of the many guides on LaTeX:
>
>     You can use any of the standard characters that you find on your
>     keyboard, except the following 10 symbols:
>            { } % & $ # _ ^ ~ \
>     These symbols may only occur in LATEX commands.
>
>
> The first seven of the characters shown above are included as literals
> by escaping them with a backslash.

Thanks for that.

> There's something else.  You may recall some time ago all the trouble
> taken to ensure entities such as &mdash; and &amp; are preserved in
> database reference entries and subsequently then preserved throughout
> DocBook processing.  Many of my references include entities in document
> titles.  Well, those entities are now appearing in the bibtex entries
> created by runbib.  As you noted, the raw ampersands choke LaTeX.  Is
> there any way of converting those xml-safe entities to LaTeX equivalents
> as runbib exports them?  In the case of '&mdash;' that would be '---'.
>

I don't think this is too hard. I'll have to adapt the character replacement
routine which currently converts the input to XML-safe strings to replace the
entities back to something that LaTeX can grok. Most of the work is to compile
a table that defines the required translations.

regards,
Markus


-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 14:21:51

David Nebauer <dav...@sw...> was heard to say:

>     IIRC, it is illegal to include colons in citation keys. I seem to
>     recall they are automatically stripped out. It is currently possible
>     to include colons in database names. But, if you made it illegal to
>     include a colon in a database name that gives you a ready-made
>     delimiter to use in citations.
>
>
> I think that suggestion makes more sense.
>

Well, if I understand *that* correctly, you suggest to use these forms in LaTeX
documents:

\cite{citekey}
\cite{dbname:citekey}

This is more compact than the "-ID" kludge that I currently use. The only
downside, if at all, is that this citation syntax is different from the one
used in the full style in SGML/XML documents (where, as noted previously, we
must not use a colon). However, this might only confuse those who work with
both LaTeX and SGML/XML. Unless someone else has objections, I'll implement
your suggestion shortly.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 12:14:07

David Nebauer <dav...@sw...> was heard to say:

> I'm afraid the current scheme is unusable.  It is possible for citation
> keys to contains hyphens -- in fact, the default key for a hyphenated
> author surname contains a hyphen.  Try using a hyphenated citation key
> and watch the fun.  Even better, combine a database name with the
> hyphenated citation key -- that introduces another hyphen.  Even more fun.
>

That's why the code does not rely on the hyphen as a separator, but on the
sequences "ID" and "-ID", which are checked for in this particular order from
left to right. Unless  a citation is malformed, you can have as many hyphens or
even "-ID" sequences in your citation keys as you like:

\cite{IDMILLER-IDRUM-2005}
      **
\cite{dbname-IDMILLER-IDRUM-2005}
            ***

The '*' mark the database name prefix separator in both cases. Unless I'm dense
this is foolproof as far as citation keys are concerned. Trouble may arise when
you use database names like "IDBASE" or "DATA-IDBASE". RefDB would have to
reject these names in order to avoid trouble.

> refdb allows you to create databases whose names contain hyphens.  I
> haven't tried including a hyphenated database with a hyphenated citation
> key in the one citation -- I'm too scared.  Interestingly, while I can
> create a database with a hyphenated name I can't delete it (at least
> with an sqlite backend) -- the deletedb operation fails.

I'll have to investigate this. SQLite databases are deleted on the filesystem
level by using an unlink() system call - I can't imagine why that would fail
with a hyphen in the filename.

>
> IIRC, it is illegal to include colons in citation keys.  I seem to
> recall they are automatically stripped out.  It is currently possible to
> include hyphens in database names.  But, if you made it illegal to
> include a hyphen in a database name that gives you a ready-made
> delimiter to use in citations.
>

Yes, colons are not allowed in IDREF attributes (xref linkend). And yes, refdbd
indeed strips out colons in citation keys to avoid creating invalid output.

The remainder of your suggestion is less clear to me. If I understand correctly,
you suggest to use the hyphen under the assumption that database names never
have hyphens (this could indeed be enforced). But how do you distinguish
between a citation key prefixed with a database name and a sole citation key
containing a hyphen? As in:

dbname-citekey
cite-key

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-08 11:24:53

Hi Markus,

> In order to
> support multiple databases I had to revert this to the original format where
> the citation key is prefixed with "ID" or "dbname-ID".
>
> I'm open for suggestions if you know a better way to safely distinguish database
> names from the citation key proper.
>   

I'm afraid the current scheme is unusable.  It is possible for citation 
keys to contains hyphens -- in fact, the default key for a hyphenated 
author surname contains a hyphen.  Try using a hyphenated citation key 
and watch the fun.  Even better, combine a database name with the 
hyphenated citation key -- that introduces another hyphen.  Even more fun.

refdb allows you to create databases whose names contain hyphens.  I 
haven't tried including a hyphenated database with a hyphenated citation 
key in the one citation -- I'm too scared.  Interestingly, while I can 
create a database with a hyphenated name I can't delete it (at least 
with an sqlite backend) -- the deletedb operation fails.

IIRC, it is illegal to include colons in citation keys.  I seem to 
recall they are automatically stripped out.  It is currently possible to 
include hyphens in database names.  But, if you made it illegal to 
include a hyphen in a database name that gives you a ready-made 
delimiter to use in citations.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-08 09:28:29

David Nebauer <dav...@sw...> was heard to say:

> This is a test of the RefDB application used in conjunction with
> vim-latexsuite.  Here is a reference \cite{Agnew0}.  Here is another
> \cite{Weckert0}.
>

Before my latest patch, these citations probably did work. I've implemented this
version a while ago to move to a citation syntax familiar to LaTeX users, i.e.
use the citation key in curly brackets. However, this does not allow a safe
distinction of citation keys with and without a database part. In order to
support multiple databases I had to revert this to the original format where
the citation key is prefixed with "ID" or "dbname-ID". The following is
supposed to work:

 This is a test of the RefDB application used in conjunction with
 vim-latexsuite.  Here is a reference \cite{IDAgnew0}.  Here is another
 \cite{otherdb-IDWeckert0}.

I'm open for suggestions if you know a better way to safely distinguish database
names from the citation key proper.

While testing the code I came across a problem with bibliography entries which
contain ampersands. The ampersand seems to be a control character in
LaTeX/bibtex and needs to be escaped in the bibtex output. I'll look into this
shortly.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-08 05:25:36

Hi Markus,

> I've fixed refdbd to support multiple databases in bibtex too.

Now it is not working for me at all.

runbib is retrieving no information.  Here is an annotated transcript:

------------------------------------------------------------------------------------------
 >> Here are the tex-related files <<

$ ls
-rw-r--r-- 1 david david  155 2006-07-08 14:13 test.aux
-rw-r--r-- 1 david david  275 2006-07-08 14:06 test.bbl
-rw-r--r-- 1 david david    1 2006-07-08 14:22 test.bib
-rw-r--r-- 1 david david  914 2006-07-08 14:06 test.blg
-rw-r--r-- 1 david david  696 2006-07-08 14:13 test.dvi
-rw-r--r-- 1 david david 3017 2006-07-08 14:13 test.log
-rw-r--r-- 1 david david  480 2006-07-08 14:06 test.tex
-rw-r--r-- 1 david david  178 2006-07-06 15:36 test.tex~

 >> Here are the references in the tex document <<

$ cat test.tex
%        File: test.tex
%     Created: Thu Jul 06 03:00 PM 2006 C
% Last Change: Thu Jul 06 03:00 PM 2006 C
%
\documentclass[a4paper]{article}
\usepackage{natbib}
\author{David Nebauer}
\title{Test Document}
\begin{document}
\maketitle

\section{Introduction}

This is a test of the RefDB application used in conjunction with 
vim-latexsuite.  Here is a reference \cite{Agnew0}.  Here is another 
\cite{Weckert0}.

\bibliographystyle{plainnat}
\bibliography{test}

\end{document}


 >> Let me prove the references exist <<

$ refdbc -C getref -d refs_computing :CK:=Agnew0
ID*:17 (2000)
Key: Agnew0
Agnew,Grace
Government Access to Encryption Keys
 

999:1 retrieved:0 failed
$ refdbc -C getref -d refs_computing :CK:=Weckert0
ID*:23 (1997)
Key: Weckert0
Weckert,J.
Intellectual Property Rights and Computer Software
Business Ethics: A European Review 6(2):102-109

999:1 retrieved:0 failed

 >> Let me show the test.aux file includes those references <<

$ cat test.aux
\relax
\citation{Agnew0}
\citation{Weckert0}
\bibstyle{plainnat}
\bibdata{test}
\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}}

 >> Here is the runbib command that returns no results <<

$ runbib -d refs_computing -S bibtex-full -t bibtex test
999:0 retrieved:0 failed

 >> The corresponding refdbib command also returns nothing <<

$ refdbib -d refs_computing -S bibtex-full -t bibtex test.aux > test.bib
999:0 retrieved:0 failed

 >> The test.bib file remains empty! <<

$ cat test.bib

$
------------------------------------------------------------------------------------------

I ran refdbd standalone at log setting 7.  Here is the feedback 
generated when running the runbib or refdbib command as above:

------------------------------------------------------------------------------------------
adding client 127.0.0.1 on fd 5
server waiting n_max_fd=5
try to read from client
serving client on fd 5 with protocol version 4
012-58-51-27
send pseudo-random string to client
parent removing client on fd 5
server waiting n_max_fd=4
gettexbib  -u david -w xxxxxxxxxxxxxxxxxxxxxxxxxxx -d refs_computing -s 
bibtex-full 19
dbi is up
localhost
david
daviduser
refs_computing

sqlite
/var/lib/refdb/db

refdb
connected to database server using database:
refdb
Main database looks ok:
refdb
localhost
david
daviduser
refs_computing

sqlite
/var/lib/refdb/db

refs_computing
SELECT meta_app,meta_type,meta_dbversion from t_meta
connected to database server using database:
refs_computing
command processing done, finish dialog now
child finished client on fd 5
child exited with code 0
server waiting n_max_fd=4
------------------------------------------------------------------------------------------

I'm not familiar with the 'gettexbib' command.  The '19' initially 
looked a little strange but I had a quick dive into refdbib.c and it 
looks like that is a legitimate parameter -- the command buffer string 
length.

Adding the database name to each reference as per the manual makes no 
difference.

Regards,
David.

P.S. The refdb-users lists is rejecting my posts sporadically with the 
claim my ISP is not providing a postmaster address, so I'm copying all 
my posts to your personal email address.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-07 23:12:47

Markus Hoenicka writes:
 > I didn't try lately, but the reverse might be true. Once upon a time the
 > DocBook/TEI code also allowed using more than one database. I'll check.
 > 

I've checked the situation with DocBook and TEI. Both support database
names in citations using the full format. refdbxp does not support
database names, presumably because the short citation format has no
means to encode a database name.

I've fixed refdbd to support multiple databases in bibtex too.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

6 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 14 15 16 17 18 .. 33 > >> (Page 16 of 33)