#195 BibTeX Export Formatting Improvement

Next full release
pending
Oliver Kopp
5
2013-01-13
2013-01-11
Jan Kubovy
No

The proposed patch updates formatting of BitTeX to improve readability. The motivation was to be able to use this tool and not to mess up our GIT repository where we keep our BibTeX files.

1. Strings

There are four types of strings: author, institution, publisher and other. The type of a string is identified by the first character of its key:
- a: author
- i: institution
- p: publisher
- everything else: other

For the first three types the second character should be uppercase. Since BibTeX ignores case, this is only to improve readability of the BibTeX file.

The fields author and editor of an BibTeX entry are both using the author string type.

2. Entries

2.1 Keys

A new key generating type "authEtAl" added which is the same as "auth.etal" with the difference that the authors are not separated by "." and in case of more than 2 authors "EtAl" instead of ".etal" is appended.

2.1.1 Institutions

An author or editor may be and institution not a person. In that case the key generator build very long keys, e.g.: for "The Attributed Graph Grammar System (AGG)" -> "TheAttributedGraphGrammarSystemAGG".

An institution name should be inside {} brackets. If the institution name also includes its abbreviation this abbreviation should be also in {} brackets. For the previous example the value should look like: "{The Attributed Graph Grammar System ({AGG})}".

If an institution includes its abbreviation, i.e. "...({XYZ})", first such abbreviation should be used as the key value part of such author.

If an institution does not include its abbreviation the key should be generated form its name in the following way:

The institution value can contain: institution name, part of the institution, address, etc. Those information should be separated by comma. Name of the institution and possible part of the institution should be on the beginning, while address and secondary information should be on the end.

Each part is examined separately:

  1. We remove all tokens of a part which are one of the defined ignore words (the, press), which end with a dot (ltd., co., ...) and which first character is lowercase (of, on, di, ...).
  2. We detect a type of the part: university, technology institute, department, school, rest
    • University: "Uni[NameOfTheUniversity]"
    • Department: will be an abbreviation of all words beginning with the uppercase letter except of words: /d[ei]part.*/, school, faculty
    • School: same as department
    • Rest: If there are less than 3 tokens in such part than the result will be by concatenating those tokens, otherwise the result will be build from the first letters of words starting with and uppercase letter.

Parts are concatenated together in the following way:

  • If there is a university part use it otherwise use the rest part.
  • If there is a school part append it.
  • If there is a department part and it is not same as school part append it.

Rest part is only the first part which do not match any other type. All other parts (address, ...) are ignored.

2.2 Fields

There are three types of fields an entry can have based on the entry type:

  • required
  • optional
  • unofficial

In the BibTeX those key groups are separated by an empty line.

To improve readability of the output BibTeX file the first field in an entry is always the title. Followed by the required group ordered by the field name. Followed by the optional group ordered by the field name. Followed by the unofficial group ordered by the field name if such unofficial group exists.

2.3 Values

The option to choose between '"' and '{', '}' as value delimiters added to Preferences -> Advanced -> Export Options.

The option to include a complete list of required and optional entry fields, even if those are not filled out, was added to Preferences -> Advanced -> Export Options.

3. Other

In the exported BibTeX file the Strings are grouped by type and ordered by key in each group. Each group is separated by an empty line. The equal signs are aligned according to the longest string key.

Entries are sorted by title. Fields in each entry are also aligned of the file. The entry types and the field names are camel-case.

1 Attachments

Discussion

  • Oliver Kopp
    Oliver Kopp
    2013-01-12

    • status: open --> pending
    • assigned_to: Oliver Kopp
     
  • Oliver Kopp
    Oliver Kopp
    2013-01-12

    Applied at the master branch. Private mail for further discussion is on its way.

     
  • This patch makes a lot of changes at the core level (the BibTeX format), so we need to make sure it doesn't cause regressions:
    1. It affects the ordering of strings in the BibTeX file. Does it honor BibTeX restrictions on string order? Strings referring other strings must come after those referred.
    2. Case sensitive handling of string names. I don't remember, but there may be some places in JabRef where names are expected to be lower case only.
    3. The ability to choose between '"' and '{', '}' as value delimiters makes me worried. Does this play well with all the internal handling when reading or writing files?
    4. "Entries are sorted by title.". We already have options for controlling the sorting of entries, so doesn't this stomp all over the existing options?
    5. "The entry types and the field names are camel-case." In what way is this camel-casing applied or used? JabRef doesn't support case sensitive field names, and there are many places you could get in trouble if you try to change this.

     
  • Jan Kubovy
    Jan Kubovy
    2013-01-12

    Ad1: It doesn't honor BibTeX restrictions on string order. This has to be done.

    Ad2: The worst thing what can happen is that a @String will be identified as OTHER and not AUTHOR, INSTITUTION or PUBLISHER. Which will not cause any real problem since the BibtexString.Type is used only for grouping @strings while writing them into a file. Example:

    @String { aDoe = "Doe, John" } will be internally identified as author.
    @String { adoe = "Doe, John 2" } will be internally identified as other.

    for Bibtex are both the same so in the mentioned example if "aDoe" or "adoe" is used somewhere the value "Doe, John 2" will be applied.

    Ad3: Writing files yes. I am assuming that reading should work too since BibTeX supports both ways.

    Ad4: You mean the "Save/Export in current table sort order"? I was thinking that sorting by title may be important enough to get its own option :-) But this is not included in this patch. I wrote it to the description by mistake. See ticket #196.

    Ad5: I believe that there are no problems with that. This is applied only when writing a file not before nor are those values changed internally. BibTeX is case-insensitive so JabRef is obviously converting everything to lowercase (or uppercase) and this behavior was not changed.

     
  • Jan Kubovy
    Jan Kubovy
    2013-01-12

    Ad1: Sorry I was wrong again: It does it honor BibTeX restrictions on string order. This is ok too.

     
  • 1-3: Great!

    4: Ok, I got the impression that the patch would overrule the sort order selection to sort according to title. Adding it as an option is a good idea.

    5: Ok, so the camel-casing is hardcoded for the standard entry types. Applying it at the writing only should be safe. As you say, the type and entry type names are lower-cased internally.