|
From: Zbigniew B. (Gandalf) <zbr...@mo...> - 2016-04-20 12:07:53
|
Hi, I work at Mozilla on I18n and L10n. We're currently looking for a localization format to use for our localization framework (codename: L20n). The format will be used as a base for all our products and technologies as we're looking to transition away from legacy formats like .properties and .DTD. During our research we evaluated MessageFormat and found a number of shortcomings that made us conclude that in its current form it doesn't address all of our needs. We see MessageFormat as a base for our experiments. Building on top of it, we've arrived at a format which is able to solve all cases we encountered over last 10 years of working with multilingual software developer by Mozilla like Firefox, Thunderbird, Firefox Mobile, Firefox OS, Firefox for iOS etc. It fits our needs and we'd like to ask for feedback from the wider audience which might be interested in improving MessageFormat -- the ICU. Our basic principle was to develop a format that would fit into the Web model and align well with formats such as HTML, CSS, JS and APIs such as ECMA402. For that reason we put a lot of focus on things such as syntax readability and editability by hand. While we hope that most localizers will end up using some localization software to aid them with the task of localization, we stick to the rule of the least power [0] and we believe that it should be fully readable and writeable in a regular text editor. We draw inspiration from text formats such as TOML. We focused on developing a full file format (instead of single-message format). The ability to *read* files in this format without prior syntax familiarity was our first principle. We're OK with the expectation that some syntax familiarity will be required for editing messages and creating new ones. Ultimately, we think of this format as a localization equivalent of what CSS did for the styling information in the HTML stack. Below is the list of topics that we see as differences between our syntax (codename FTL) and MessageFormat that we'd like to bring to ICU-Design group 's attention for consideration. You can find the FTL syntax description here: https://github.com/stasm/l20n-syntax-experiments/blob/master/ftl/grammar.ebnf Per-file syntax vs. Per-message syntax ====================================== While it is perfectly possible to design any syntax on per-message level, we believe that it creates a set of problems that cannot be resolved within the limited context. First, per-message syntax naturally means that any file the user is reading/editing is a combination of two syntaxes and it comes with a cognitive load of understanding and operating on two levels. Things like escaping, error reporting, and "mental parsing" user has to consider on both levels. We decided that it is a blocker for us and chose to design a single syntax to store lists of messages. Additionally this decision allowed us to bring in many smaller features like per-message comments, sections and inter-message references. Lastly, we are interested in building a very strong error recovery model and error reporting model for our format to aid users in building high quality localizations and counter the natural complexity that comes when trying to address more complex aspects of linguistics by non-developers. In its simplest form an FTL file looks like this: --- example.ftl --- key1 = Value 1 key2 = Value 2 Multi-line translations ======================= We consider multi-line messages in MessageFormat syntax to be hard to read and write. We designed the FTL syntax to facilitate intuitive multi-line syntax. Example: --- example.ftl --- key1 = | This is | a multiline | message. This is especially useful when the localization message is used to translate a fragment of text with some markup. For instance, the translation could include some HTML markup: --- example.ftl --- key1 = | <p> | <strong>ProductName</strong> | is available via <a>this link</a> starting today! | </p> Per-message comments ==================== Per message comments are incredibly useful in helping translators understand the context in which the message is being used and the external arguments that are provided by the developer into the translation. --- example.ftl --- # This is a comment attached to a message # @usage: Notification bar message # @arg $num - Number of unread messages key1 = Unread emails: { $num } Familiar built-in call expressions ================================== MessageFormat call expressions are position based and require a lot of understanding of the syntax (and order) in order to be able to read/write them. FTL reuses a lot of syntax from Excel and JavaScript allowing users to reuse any prior familiarity with other call expression syntaxes: --- example.ftl --- key1 = Your balance is: { NUMBER($num, style: "currency", currency: "USD") } Compound L10n Entities ====================== While MessageFormat allows only for a single value, in L20n (and FTL) a single localization entity may be used to localize a compound object, like a UI widget (represented by an HTML element or a Web Component, or a Joomla object etc.): --- example.ftl --- key1 = This is a value [html/aria-label] This is an aria-label [html/title] Description of the title --- example.html --- <p l10n-id="key1"></p> --- rendered result --- <p l10n-id="key1" aria-label="This is an aria-label" title="Description of the title"> This is a value </p> This allows language bindings (like HTML bindings) to use the "traits" of the entity together with its value. It means that we keep the elements of the same "widget" together for localization purposes instead of trying to create separate localization messages for separate aspects of the widget. Message variants and entity references ====================================== Additionally, FTL allows us to use the same trait system to define additional information on the message. Every translation is free to choose what these are exactly. --- example.ftl --- brandShortName = *[nominative] Firefox [accusative] Firefoksa [locative] Firefoksowi This additional data can be used from within other messages: --- example.ftl --- key1 = Pomóż { brandShortName[locative] } In the above example, we referred to one entity from another. It's extremely useful for all languages that use declensions, genders, differentiate between animate vs. inanimate nouns etc. Syntax designed to separate what's translatable =============================================== While, as I said initially, we hope that syntax highlighting and tools will help localizers, we also wanted to make sure that it's easy for localizers to spot which parts of the translation are to be localized and which are parts of the syntax. Consider MF vs FTL examples: ------ MF ------ Cart: {itemCount} {itemCount, plural, one {item} other {items} } ------ FTL ------ key1 = Cart { $itemCount } { PLURAL($itemCount) -> [one] item [other] items } The FTL syntax clearly denotes what's not localizable by putting sigils around those parts of the message. MessageFormat confusingly uses the same syntax (plain English words) for both instructions and translations. Gender of nouns =============== A combination of entity references, traits and the select expression allows us to build gender specific messages, which we found useful when dealing with terms like brand names: --- example.ftl --- brandShortName = Aurora [_gender] feminine unknownError = Aurora { brandShortName[_gender] -> [masculine] wykonał nieprawidłową operację [feminine] wykonała nieprawidłową operację } Trailing, leading spaces ======================== While we believe that in almost all cases users don't want o think about trailing/leading spaces, it's valuable to have a solution in the syntax for when those are meaningful. FTL solves it by providing a double-quote strings as an option: --- example.ftl --- key1 = " This message has leading and trailing spaces " Implicit formatters =================== FTL is designed to allow for some implicity to be possible: --- example.ftl --- # This is an implicit version of { LIST($user1, $user2) } list-of-users = { $user1, $user2 } like you # This is implicitly running $num through PLURAL($num) select-value = { $num -> [0] You have no emails [1] You have one email [few] You have { $num} emails } # This is implicitly running { NUMBER($num) } number-string = Your lucky number is { $num } If user needs to pass specific formatting options, he may use the explicit version of course. Conclusion ========== This is an overview of differences that we see between MessageFormat and FTL. We believe that we were able to come up with a format that is easier to read/write in both, its simplest form and in more complex scenarios than MF is, solving many long-standing syntactic issues like multi-line, quote delimiters, escaping and argument passing, while at the same time adding features that make the syntax more powerful and flexible. We'd be interested in getting feedback from the ICU design group on the proposal and working with ICU on improving MessageFormat if ICU design group will find any of the solutions attractive. You can play with our syntax here: http://stasm.github.io/l20n-tinker/ftl/ Greetings, zb. [0] https://www.w3.org/2001/tag/doc/leastPower.html |