Wrong chars in reference table manager

JabRef is a graphical application for managing bibliographical data

Brought to you by: dudelsack, ifsteinm, kbx, mortenalver, olly98

#1117 Wrong chars in reference table manager

Milestone: next release

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2013-03-29

Created: 2012-12-05

Creator: Daniel Kaneider

Private: No

Hi,

I imported the bibtex from an ACM paper via Copy&Paste into the source view of the reference manager. E.g. http://dl.acm.org/citation.cfm?doid=502348.502353

The preview entry shows the author formatted in the correct way:
Guimbretière (take note at the French letter at the end).

I would expect to see the same name also in the table of the reference manager. But instead it is:
Guimbreti`{e}re

The same behaviour also occurs on other fields such as 'title'.

I'm not sure, whether there is some hidden pref to turn this behaviour off, but it should not be the default case.

Best,
Daniel

Discussion

Oscar Gustafsson - 2012-12-15

JabRef now converts a lot of characters into their corresponding LaTeX codes automatically. The up-side is of course less manual work to get a nice looking reference at the end, independent of font encoding. The down-side though is that they are also shown like that in the main table. While it is possible to convert some of them back into a suitable encoding, this may be error prone and time consuming.

Hence, at the moment, the main table shows the corresponding bibtex entry code, while the preview shows a formatted entry (although this conversion can probably be improved as well).

Can be improved, but hardly a bug.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oscar Gustafsson - 2012-12-15

Not sure exactly what you refer to:

If I use "Add new entry from plain text" it behaves as you would like.

If I use "export as bibtex" from ACM, they are friendly enough to convert HTML-entities it to LaTeX-entities

If I import, JabRef converts the HTML entities to LaTeX

To further clarify, JabRef reads:

François Guimbretière

and while the preview window shows HTML, this is clearly not what you would like to see in the main table.

The "even more correct" thing would be to look at the current character encoding and based on that replace the HTML entities with the corresponding character. However, this would require that the LaTeX document is using the same encoding, which may not always be possible and will clearly make the .bib-files less portable. This would clearly be better since sorting etc would work transparently, now \'{e} is sorted as e, but e.g., \"{o} is the last character in the Swedish alphabet and should no be sorted with o.

Unicode will of course provide a solution for this, but at the moment the Unicode support in plain BibTex is limited.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Morten Omholt Alver - 2012-12-15

If the issue is the table display (which shows the field contents literally), it is possible to convert everything shown in the table to a reader-friendly format. I've tested it once, but was worried about performance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oliver Kopp - 2012-12-15

Maybe we could have that feature configurable? Default is ON (as most users should have a fast PC). If a user encounters performance issues, he can disabled this feature.

Another alternative would be to add "convert to UTF-8" at the cleanup functionality. That, in turn, could cause trouble the next five years or so with Springer etc...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Kaneider - 2012-12-17

I think Morten got the problem. It's actually an UI related bug/feature request, call it what you like. I don't want to change the way how the entries are stored, but how they are displayed in the table.
The table currently shows the raw data, which is very reader-Unfriendly. Given the fact that editing is not possible in that table (which is fine), it should not be a big problem to make it user friendly i.e. convert it. I don't know how this affects performance, but a default ON option would be ok, too.

To answer the conversation related topics: In neither think a special cleanup or UTF-8 conversion is necessary. I nor think such a feature would be useful at all, since the raw data should be saved in a computation friendly and character encoding independent way.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.