Thread: Re: [Refdb-devel] latex bibliographies with multiple databases (Page 2)

Status: Beta

Brought to you by: mhoenicka

refdb-devel

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-07-11 17:57:38

Gidday

As a novice latex user it seems okay; I can't envisage any difficulties.

Probably jumping the gun a bit here, but I'd like to see the following 
kind of latex markup reproduced (not stripped entirely):

BIBTEX:
   TITLE = {Why {AM} and {EURISKO} appear to work},

Current RIS:
TI  - Why {AM} and {EURISKO} appear to work

Current RISX:
       <title type="full">Why {AM} and {EURISKO} appear to work</title>

The reason is that bibtex will capitalise your bibliography according to 
the current bibliography style (generally with a single leading 
capital), and you need to inform bibtex that certain passages are to be 
formatted verbatim.

Other than that, I can't envisage the need for keeping latex markup - 
I've been manually stripping everything else. I enter most of my 
references using risx (I use bibtex and RIS where provided but usually 
edit them manually before submitting them to RefDB), and I don't appear 
to have any entites coming back (probably because I don't know how to 
use entities).
Except I think entities are used in some of the URLs - I don't know if 
they're necessary or not, they're just there.

Peace
Damien

David Nebauer wrote:
> Hi Markus,
> 
> 
>>I take from this discussion:
>>
>>1) Use a bib2ris post-processing script (or rewrite bib2ris to contain such
>>code) which strips markup like boldface, superscript etc. and translates
>>foreign characters entered as LaTeX constructs to their Unicode equivalents.
>>
>>2) Modify the code to prevent XML entities to show up in LaTeX output.
>>
>>3) Add code to escape the LaTeX command characters in the LaTeX output.
>>
>>The second point is a bit tricky. References imported from RIS usually do not
>>contain entities, but references imported from risx are likely to do. Either I
>>convert these entities during import, or I remove them during LaTeX export. The
>>former seems cleaner to me, and I think this is what you had in mind.
> 
> 
> Yes, in my view the storage format is Unicode without markup:
> 
> 
> BibTeX -------               ---------> DocBook
>              |               |
>              |               |
> RIS ---------+--> STORAGE ----
>              |   (Unicode)   |
>              |               |
> RISX ---------               ---------> LaTeX
> 
> 
> 
> Whatever the input format, all references end up in the same storage 
> format (Unicode sans markup).  This would require stripping out XML 
> entities and LaTeX markup.  With luck you can use existing tools to do 
> this.  The stored references can then be output in either DocBook- or 
> LaTeX-compatible format.  This seems to be to be an elegant way of 
> dealing with the mishmash of input and output formats.
> 
> Regards,
> David.
> 
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Refdb-devel mailing list
> Ref...@li...
> https://lists.sourceforge.net/lists/listinfo/refdb-devel

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-11 19:51:35

Hi Damien,

Damien Jade Duff wrote:
> I'd like to see the following kind of latex markup reproduced (not 
> stripped entirely):
>   TITLE = {Why {AM} and {EURISKO} appear to work},

Alternately, you could save the title in plain Unicode with that 
capitalisation and refdb's BibTeX output filter would wrap any 
abnormally capitalised words in braces.  That way your reference can be 
used in either DocBook XML/SGML or LaTeX output.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-11 21:26:08

David Nebauer writes:
 > >   TITLE = {Why {AM} and {EURISKO} appear to work},
 > 
 > Alternately, you could save the title in plain Unicode with that 
 > capitalisation and refdb's BibTeX output filter would wrap any 
 > abnormally capitalised words in braces.  That way your reference can be 
 > used in either DocBook XML/SGML or LaTeX output.
 > 

If it is a matter of wrapping uppercased words in curly brackets, this
certainly doable. Is it likely to have uppercased words in bibtex data
which are *not* supposed to be rendered in all-caps?

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-07-12 11:28:02

Markus Hoenicka wrote:
> David Nebauer writes:
>  > >   TITLE = {Why {AM} and {EURISKO} appear to work},
>  > 
>  > Alternately, you could save the title in plain Unicode with that 
>  > capitalisation and refdb's BibTeX output filter would wrap any 
>  > abnormally capitalised words in braces.  That way your reference can be 
>  > used in either DocBook XML/SGML or LaTeX output.
>  > 
> 
> If it is a matter of wrapping uppercased words in curly brackets, this
> certainly doable. Is it likely to have uppercased words in bibtex data
> which are *not* supposed to be rendered in all-caps?
> 
> regards,
> Markus
> 

Gidday

I imagine only where the style demands it (e.g. APA). Some journals have 
titles with pretty much everything (both proper and non-proper nouns and 
subitles and acronyms) capitalised. On the other hand, APA requires 
nouns that aren't proper to go lower case when citing - proper nouns and 
subitles can be uppercase.

I don't know how this is managed in Docbook styles or Endnote etc, but 
presumably titles are used verbatim because I can't imagine any logic 
funky enough to figure out whether a word is a proper noun or not 
without some extra markup.

If we're going to export all capitals to bibtex verbatim sans markup 
then I think we might as well, as a first approximation, rather than 
trying to second guess bibtex, just set the whole title as verbatim - e.g.

    TITLE = {{Why AM and EURISKO appear to work}},

Folks who use latex would probably have to turn non-proper nouns that 
are capitalised into lowercase before putting them in to RefDB and hope 
to seldom encounter a citation style that requires these things to be 
reproduced verbatim from the original article (I don't think I have yet).

The same may possibly apply to BOOKTITLE, JOURNAL, SERIES, SCHOOL, 
PUBLISHER, INSTITUTION etc.

My 2nd pennies worth.

Regards
Damien

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 12:30:12

Damien Jade Duff <D.J...@cs...> was heard to say:

> trying to second guess bibtex, just set the whole title as verbatim - e.g.
>
>     TITLE = {{Why AM and EURISKO appear to work}},
>
> Folks who use latex would probably have to turn non-proper nouns that
> are capitalised into lowercase before putting them in to RefDB and hope
> to seldom encounter a citation style that requires these things to be
> reproduced verbatim from the original article (I don't think I have yet).
>

RefDB uses bibliography styles even for generating LaTeX bibliographies. These
styles currently only take care of the capitalization of titles. IIRC you get
the best results if you add your data using mixed-case and then pick the proper
style for output. If we combine that with the curly brackets as shown above, you
should be able to generate at least lowercased (except the first char),
all-caps, and mixed case output.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-07-12 13:10:26

> RefDB uses bibliography styles even for generating LaTeX bibliographies. 

Oh. Right!

> should be able to generate at least lowercased (except the first char),

I don't think I'd be able to use that because presumably it would 
lowercase proper nouns, acronyms and subtitle headings.

> all-caps, and mixed case output.

The latter is fine except, as before, when your article title 
capitalises verbs and improper nouns. I'm happy to do it this way 
though, I'll just make sure no capitalised verbs and improper nouns get 
into my database and hope not to encounter citation styles that require 
them.

Cheers
Damien

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-30 05:50:45

Damien Jade Duff wrote:
>> all-caps, and mixed case output.
>>     
> The latter is fine except, as before, when your article title 
> capitalises verbs and improper nouns. I'm happy to do it this way 
> though, I'll just make sure no capitalised verbs and improper nouns get 
> into my database and hope not to encounter citation styles that require 
> them.

This has been bothering me. If the default model for encoding is:

XML LaTeX
| |
| |
INPUT unicode unicode
\ /
\ /
\ /
\ /
v
STORAGE unicode
/ \
/ \
/ \
/ \
OUTPUT convert convert
| |
| |
v v
XML LaTeX

There will undoubtedly be users like Damien who make the choice to 
include markup in their bibliographic data. Although it effectively 
traps them into one output format the trade-off is greater control over 
how that data is eventually displayed.

Would not it be simple, given the above model, to have a command line 
switch for runbib and refdbib that skips the conversion step? That seems 
to me an easy way to accommodate the wishes of everybody.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-30 21:29:24

David Nebauer wrote:
> If the default model for encoding is:
>
> XML LaTeX
> | |
> | |
> INPUT unicode unicode
> \ /
> \ /
> \ /
> \ /
> v
> STORAGE unicode
> / \
> / \
> / \
> / \
> OUTPUT convert convert
> | |
> | |
> v v
> XML LaTeX

Spaces were stripped.  That should look like:

...........XML......LaTeX
............|.........|
............|.........|
INPUT.....unicode.unicode
.............\......./
..............\...../
...............\.../
................\./
.................v
STORAGE.......unicode
................/.\
.............../...\
............../.....\
............./.......\
OUTPUT....convert.convert
............|.........|
............|.........|
............v.........v
...........XML......LaTeX

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-08-17 08:39:48

Hi Damien,

Damien Jade Duff wrote:

> Incidentally, how do docbook users deal with this capitalisation 
> issue? i.e. capitalisation of acronyms and proper nouns versus of 
> subtitles and improper nouns. Anyone?

The bibliography style options for TITLE are
    case: ASIS ICAPS LOWER UPPER
    style: BOLD BOLDITALIC BOLDITULINE BOLDULINE ITALIC ITULINE NONE SUB 
SUPER ULINE

As you can see from the case options your only choice for preserving 
mixed case is to use ASIS, which of course means "as is".  In that 
situation it is up to the person inputting the reference in the first 
place to get the case right.

Does that answer your question?

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-08-17 14:22:37


> 
>>Incidentally, how do docbook users deal with this capitalisation 
>>issue? i.e. capitalisation of acronyms and proper nouns versus of 
>>subtitles and improper nouns. Anyone?
> 
> 
> The bibliography style options for TITLE are
>     case: ASIS ICAPS LOWER UPPER
>     style: BOLD BOLDITALIC BOLDITULINE BOLDULINE ITALIC ITULINE NONE SUB 
> SUPER ULINE
> 
> As you can see from the case options your only choice for preserving 
> mixed case is to use ASIS, which of course means "as is".  In that 
> situation it is up to the person inputting the reference in the first 
> place to get the case right.
> 
> Does that answer your question?

Yes, thank you. Since Docbook users get by without this extra markup I 
think we latex users should be able to get by without it too. I have no 
complaints. Though it seems logically possible, I doubt I will ever 
actually find an instance where anything more complicated than the above 
options is required.
Peace
Damien

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Damien J. D. <D.J...@cs...> - 2006-08-16 15:49:57

Gidday gidday

As for entering unicode via jedit, there is a graphical plugin for it:
http://plugins.jedit.org/plugins/?CharacterMap
But I haven't been able to get it to work with unicode and it seems that 
the current implementation is a tad broken:
http://community.jedit.org/?q=node/view/1628&pollresults%5B1252%5D=1
We have to wait for a fix to be uploaded.

The good thing about using unicode as storage is that it's a blank slate 
and that means that if the community decides that some kind of markup is 
in fact useful the maintainer(s!) can choose to support it at a later 
point via whatever method they like and in theory they(he!) know(s) 
exactly what kind of data is in the database at any given time: i.e. 
more control over what is coming in and out. See I wouldn't be surprised 
if the required-capitalisation markup was asked for at some time in the 
future when a user wants to use latex+docbook but exploit the latex 
markup, but it can be done from a unicode starting point anyway. My 8.2 
cents worth

Incidentally, how do docbook users deal with this capitalisation issue? 
i.e. capitalisation of acronyms and proper nouns versus of subtitles and 
improper nouns. Anyone?

Peace
Damien

David Nebauer wrote:
> Damien Jade Duff wrote:
> 
>>>all-caps, and mixed case output.
>>>    
>>
>>The latter is fine except, as before, when your article title 
>>capitalises verbs and improper nouns. I'm happy to do it this way 
>>though, I'll just make sure no capitalised verbs and improper nouns get 
>>into my database and hope not to encounter citation styles that require 
>>them.
> 
> 
> This has been bothering me. If the default model for encoding is:
> 
> ...........XML......LaTeX
> ............|.........|
> ............|.........|
> INPUT.....unicode.unicode
> .............\......./
> ..............\...../
> ...............\.../
> ................\./
> .................v
> STORAGE.......unicode
> ................/.\
> .............../...\
> ............../.....\
> ............./.......\
> OUTPUT....convert.convert
> ............|.........|
> ............|.........|
> ............v.........v
> 
> There will undoubtedly be users like Damien who make the choice to 
> include markup in their bibliographic data. Although it effectively 
> traps them into one output format the trade-off is greater control over 
> how that data is eventually displayed.
> 
> Would not it be simple, given the above model, to have a command line 
> switch for runbib and refdbib that skips the conversion step? That seems 
> to me an easy way to accommodate the wishes of everybody.
> 
> Regards,
> David.
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Refdb-devel mailing list
> Ref...@li...
> https://lists.sourceforge.net/lists/listinfo/refdb-devel

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-19 20:36:39

Hi David,

David Nebauer writes:
 > Yes, in my view the storage format is Unicode without markup:
 > 
 > 
 > BibTeX -------               ---------> DocBook
 >              |               |
 >              |               |
 > RIS ---------+--> STORAGE ----
 >              |   (Unicode)   |
 >              |               |
 > RISX ---------               ---------> LaTeX
 > 
 > 
 > 

I've done a little source code reading and testing in order to find
out how RefDB mangles these kinds of input and output data. My results
are as follows:

1) BibTeX input
bib2ris appears to work ok with UTF-8 encoded bibtex data. You can
import the resulting RIS data as long as the input encoding is set to
UTF-8 (the current default is ISO-8859-1, but it certainly makes sense
to change that). If your bibtex data is plain ASCII with foreign and
special characters encoded as LaTeX commands, the bib2ris output
should be sent through the new refdb_latex2utf8txt script. I don't
know whether it really has 100% coverage of the character-related
LaTeX command, but it is easy to extend if need arises. With this in
mind we can import bibtex data as plain Unicode.

2) RIS input
We'd have to educate users to author their RIS datasets in UTF-8, and
to run RIS data from web sources (like Pubmed) through iconv before
adding them to RefDB. All it takes is to set the default input
encoding of refdbd for RIS data to UTF-8 (see above). Currently there
are no provisions to translate entities or LaTeX commands, but if used
correctly there should be no need to use such hacks. The result is, as
above, plain Unicode.

3) risx input
I've rediscovered a nice feature of expat (which refdbd
uses to parse all incoming XML data). The output data of expat are
always UTF-8, with all entities expanded to their Unicode
equivalents. Thus no extra conversion step is required to get rid of
entities and to store plain Unicode.

4) SGML/XML output (bibliographies, db31/tei/html backends)
"<>&" are replaced with their corresponding entities. In addition, the
current code contains replacements for &mdash; &lsquo; and &rsquo;. I
know that I was asked to add these, but I can't remember the
context. I wonder whether it would make more sense to keep these
characters as Unicode.

5) LaTeX output
There are currently no attempts to escape LaTeX command
characters. I'm about to add this code.

6) other output (RIS, screen)
No replacements. If you retrieve data as UTF-8, you'll get what you
want.

As always, I might have missed some bordercases. If you experience a
different behaviour, please let me know.

One thing that should be discussed is how easy it is for RefDB users
to author UTF-8 data, be it RIS, bibtex, or XML. You can always insert
the numeric form into XML data (e.g. &#0x00B1) but I'm afraid this
won't work for the other data formats. As an Emacs user I've got Norm
Walsh's xmlunicode.el (http://nwalsh.com/emacs/xmlchars/) which allows
to select characters from a pop-up list or from the minibuffer with
entity-name completion, and which also defines an input mode which
offers on-the-fly replacement of entities. Is there similar support
available for other editors (vim, jedit) which should be mentioned in
the manual?

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-19 20:42:20

Markus Hoenicka writes:
 > 2) RIS input
 > We'd have to educate users to author their RIS datasets in UTF-8, and
 > to run RIS data from web sources (like Pubmed) through iconv before
 > adding them to RefDB. All it takes is to set the default input
 > encoding of refdbd for RIS data to UTF-8 (see above). Currently there
 > are no provisions to translate entities or LaTeX commands, but if used
 > correctly there should be no need to use such hacks. The result is, as
 > above, plain Unicode.
 > 

Actually we can still use ISO-8859-1 or whatever as the RIS input
format as refdbd internally converts it to UTF-8 if the database uses
this encoding. Forcing UTF-8 for RIS data actually makes only sense if
people use both bibtex and RIS data.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-24 10:36:30

Hi Markus,

Markus Hoenicka wrote:
> 1) BibTeX input
> [W]e can import bibtex data as plain Unicode.
>
> 2) RIS input
> All it takes is to set the default input
> encoding of refdbd for RIS data to UTF-8.
>
> 3) risx input
> [N]o extra conversion step is required to get rid of
> entities and to store plain Unicode.

All looks too easy so far.
> 4) SGML/XML output (bibliographies, db31/tei/html backends)
> "<>&" are replaced with their corresponding entities. In addition, the
> current code contains replacements for &mdash; &lsquo; and &rsquo;. I
> know that I was asked to add these, but I can't remember the
> context. I wonder whether it would make more sense to keep these
> characters as Unicode.
>   

I fear I may have partly the cause.  I was using refdb purely for 
docbook and since I didn't use UTF-8 encoding for my references the only 
way to preserve characters like em dash was to protect them as entities 
throughout the reference's life cycle.  They were not only protected 
during output, as you mention above, but were protected at input also.  
In moving to a more sensible unicode-based system, however, it no longer 
makes any sense to replace those characters with entities.
> 5) LaTeX output
> There are currently no attempts to escape LaTeX command
> characters. I'm about to add this code.
>   

I see this code arrived today.
> 6) other output (RIS, screen)
> No replacements. If you retrieve data as UTF-8, you'll get what you want.
>
>
> One thing that should be discussed is how easy it is for RefDB users
> to author UTF-8 data, be it RIS, bibtex, or XML. Is there [Unicode input] support
> available for other editors (vim, jedit) which should be mentioned in
> the manual?
>   

Many unicode characters (and certainly all the commonly used ones) are 
entered by means of digraphs (using two or more keystrokes to specify 
one character).  The mnemonics for these are fairly intuitive, like 'a:' 
for a-umlaut.  Any unicode character can be entered with 'Ctrl-v uxxxx' 
where 'xxxx' is the character code.
Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-24 11:04:20

David Nebauer <dav...@sw...> was heard to say:

> > 4) SGML/XML output (bibliographies, db31/tei/html backends)
> > "<>&" are replaced with their corresponding entities. In addition, the
> > current code contains replacements for &mdash; &lsquo; and &rsquo;. I
> > know that I was asked to add these, but I can't remember the
> > context. I wonder whether it would make more sense to keep these
> > characters as Unicode.
> >
>
> I fear I may have partly the cause.  I was using refdb purely for
> docbook and since I didn't use UTF-8 encoding for my references the only
> way to preserve characters like em dash was to protect them as entities
> throughout the reference's life cycle.  They were not only protected
> during output, as you mention above, but were protected at input also.
> In moving to a more sensible unicode-based system, however, it no longer
> makes any sense to replace those characters with entities.

I see. Then I'll remove these entities again.

> > 5) LaTeX output
> > There are currently no attempts to escape LaTeX command
> > characters. I'm about to add this code.
> >
>
> I see this code arrived today.

Yes. I didn't get round to announce it, but please give it a real-world test to
see whether it works ok.

> Many unicode characters (and certainly all the commonly used ones) are
> entered by means of digraphs (using two or more keystrokes to specify
> one character).  The mnemonics for these are fairly intuitive, like 'a:'
> for a-umlaut.  Any unicode character can be entered with 'Ctrl-v uxxxx'
> where 'xxxx' is the character code.

Is there a link to some doc that explains this, by any chance? I thought about
adding something like a tip box to the docs that briefly explains how to deal
with Unicode characters for the most popular editors. I'd like to add URLs for
further information.

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-24 15:38:51

Hi Markus,

Markus Hoenicka wrote:
>> Many unicode characters (and certainly all the commonly used ones) are
>> entered by means of digraphs (using two or more keystrokes to specify
>> one character).  The mnemonics for these are fairly intuitive, like 'a:'
>> for a-umlaut.  Any unicode character can be entered with 'Ctrl-v uxxxx'
>> where 'xxxx' is the character code.
>>     
>
> Is there a link to some doc that explains this, by any chance? I thought about
> adding something like a tip box to the docs that briefly explains how to deal
> with Unicode characters for the most popular editors. I'd like to add URLs for
> further information.

Vim documentation is the ultimate triumph of substance over style.  In 
aggregate it contains every fact you could or would ever want to know 
about Vim.  The problem is it's almost impossible to find the 
information you want.  On the rare occasion you do it is so dry and 
technical as to be a foreign language altogether.

There actually appear to be three general methods of entering unicode 
characters:

1. Digraphs

I mentioned these in my previous post.  This is the easiest method to 
learn and remember.  Documentation is here: 
<http://vimdoc.sourceforge.net/htmldoc/digraph.html>.  Same 
documentation is available within Vim by typing ':h digraphs'.  Type 
':digraphs' for a list of digraphs.

2. Keymaps

Frankly, I've skimmed this topic a few times and can't make head nor 
tail of it.  It claims unicode characters can be entered as combinations 
of other characters (sounds somewhat like digraphs but apparently is 
different).  Documentation is here: 
<http://vimdoc.sourceforge.net/htmldoc/mbyte.html>.  Same documentation 
is available within Vim by typing ':h multibyte'.

3. Direct entry

This is done by 'Ctrl-v u xxxx' where 'xxxx' is the hex number of a 
unicode character.  Documentation is included in multi-byte help at 
<http://vimdoc.sourceforge.net/htmldoc/mbyte.html#utf-8-typing> or 
within Vim by typing ':h utf-8-typing'.

Regards,
David.

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 21:07:38

Hi David,

David Nebauer writes:
 >  From the 'ucs' package documentation:
 > 
 >     The simplest use of this package is to add
 >         \usepackage{ucs}
 >         \usepackage[utf8x]{inputenc}
 >     to your header. You may even omit the first line in many cases.
 > 
 > 
 > It worked for me "out of the box".  I installed the 'ucs' package 
 > (apt-get install latex-ucs), added those two lines to the preamble, ran 
 > 'latex test' and, presto, gloriously rendered unicode.
 > 

There seems to be something else for Unicode support. Ever heard of
this? Is that supposed to work without installing additional files?

(cited from:
http://mail.nl.linux.org/linux-utf8/2004-04/msg00000.html)

------
In mid February, the LaTeX project team released a new version that now
supports UTF-8. For details, see

  ftp://ftp.tex.ac.uk/tex-archive/macros/latex/base/utf8ienc.dtx

You now can finally simply replace

  \usepackage[latin1]{inputenc}

with

  \usepackage[utf8]{inputenc}

and can this way move all your non-ASCII LaTeX input to UTF-8 as well.


-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: Markus H. <mar...@mh...> - 2006-07-12 23:31:35

Hi David,

David Nebauer writes:
 >     latex2utf8txt  converts LaTeX files to UTF-8 text, removes line 
 > breaks from paragraphs

I used parts of this script to hack a tex2mail replacement. It isn't
more than a collection of regular expression substitutions which is to
be used as a post-processing filter of bib2ris output. I've added it
to the subversion repository, but it is not yet included into the
build system. Please test it against your data and modify it as
needed or let me know what else should be covered.

Either update your svn sources, or visit the svn web interface:

http://svn.sourceforge.net/viewcvs.cgi/refdb/refdb/trunk/scripts/refdb_latex2utf8txt?view=log

The script does the following:

- replace foreign characters encoded as {\..} constructs with their
  UTF-8 counterparts

- remove {\xy ...} commands, leaving only the enclosed text

- remove non-escaped curly brackets

- unescape escaped command characters: # $ % & ~ _ ^ \ { }

- convert the LaTeX dashes '--' and '---' to '-'

regards,
Markus

-- 
Markus Hoenicka
mar...@ca...
(Spam-protected email: replace the quadrupeds with "mhoenicka")
http://www.mhoenicka.de

Re: [Refdb-devel] latex bibliographies with multiple databases

From: David N. <dav...@sw...> - 2006-07-06 23:58:18

Hi Markus,

> I'll set up a test case and see what happens. I recall it might be
> necessary to specify a default database anyway (i.e. use the -d switch
> of runbib) even if you specify a database in each citation. Does the
> problem persist if you set a default database?

Yes.  I tried it with and without specifying a default database for 
'runbib'.

Regards,
David.

<< < 1 2 (Page 2 of 2)