Currently, the data on my dataset comes in with odd formatting markers, (like: "example |fi{(m)}, example2 |fi{(m)}") used for Toolbox. These need to be dealt with to have uncluttered language data.
We can choose to strip those formatting markers out, or to separate them into appropriate "formatted" database fields (formatted for html). As I dig in to making these changes, I think I'll go with the second option, but I reserve the right to change my mind.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is an explanation of the formatting, emailed to me by translator:
The vertical bar signals a formatting code. We mostly have |fi{xx} to signal that the xx is in italics (253 times). There are a few |fb{xx} in the \nt fields (for bold), but I think those are not in the data you have available.
Other in-line formatting codes (without the bar) are:
fi:xxxx italics on the word xxxx (turned off at the word break)
fi:{xxx…xxx} italics on the word or phrase in the curly brackets {}
fb:{xxx} bold on the word or phrase in the brackets {}
fv:xxxx vernacular font (usually bold) on the word xxxx (turned off at the word break)
fv:{xxx…xxx} vernacular font (usually bold) on the word or phrase in the brackets {}
fn:{xxx…xxx} national language (Spanish) font (usually non-bold) on the word or phrase in the brackets {}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We can choose to strip those formatting markers out, or to separate them into appropriate "formatted" database fields (formatted for html). As I dig in to making these changes, I think I'll go with the second option, but I reserve the right to change my mind.
Here is an explanation of the formatting, emailed to me by translator:
The vertical bar signals a formatting code. We mostly have |fi{xx} to signal that the xx is in italics (253 times). There are a few |fb{xx} in the \nt fields (for bold), but I think those are not in the data you have available.
Other in-line formatting codes (without the bar) are:
fi:xxxx italics on the word xxxx (turned off at the word break)
fi:{xxx…xxx} italics on the word or phrase in the curly brackets {}
fb:{xxx} bold on the word or phrase in the brackets {}
fv:xxxx vernacular font (usually bold) on the word xxxx (turned off at the word break)
fv:{xxx…xxx} vernacular font (usually bold) on the word or phrase in the brackets {}
fn:{xxx…xxx} national language (Spanish) font (usually non-bold) on the word or phrase in the brackets {}