Menu

#6 Formatting needs to be cleaned up in import scripts

open-accepted
None
5
2011-07-10
2011-07-10
Russianspi
No

Currently, the data on my dataset comes in with odd formatting markers, (like: "example |fi{(m)}, example2 |fi{(m)}") used for Toolbox. These need to be dealt with to have uncluttered language data.

Discussion

  • Russianspi

    Russianspi - 2011-07-10
    • status: open --> open-accepted
     
  • Russianspi

    Russianspi - 2011-07-10

    We can choose to strip those formatting markers out, or to separate them into appropriate "formatted" database fields (formatted for html). As I dig in to making these changes, I think I'll go with the second option, but I reserve the right to change my mind.

     
  • Russianspi

    Russianspi - 2011-07-10

    Here is an explanation of the formatting, emailed to me by translator:

    The vertical bar signals a formatting code. We mostly have |fi{xx} to signal that the xx is in italics (253 times). There are a few |fb{xx} in the \nt fields (for bold), but I think those are not in the data you have available.

    Other in-line formatting codes (without the bar) are:

    fi:xxxx italics on the word xxxx (turned off at the word break)
    fi:{xxx…xxx} italics on the word or phrase in the curly brackets {}

    fb:{xxx} bold on the word or phrase in the brackets {}

    fv:xxxx vernacular font (usually bold) on the word xxxx (turned off at the word break)
    fv:{xxx…xxx} vernacular font (usually bold) on the word or phrase in the brackets {}

    fn:{xxx…xxx} national language (Spanish) font (usually non-bold) on the word or phrase in the brackets {}

     

Log in to post a comment.