languagesys-discussion Mailing List for LanguageSys
Status: Inactive
Brought to you by:
mc_breit
You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|
From: Florian B. <fl...@ph...> - 2005-10-16 19:48:36
|
What do you think about a language specific serialization method? I think we should implement a serialization method within the parser, so that we can cache the parsed xml and don't have to parse it every time, this should be specific on the programming language used, so that we can minimize the load by this. This means that the serialized datas are created and maintained by the parser, too. |
From: <eg...@sw...> - 2005-10-16 19:31:50
|
Hi, I think that the list is not bad. =20 For the first version I think it's enough. Gregor Wegberg -----Urspr=FCngliche Nachricht----- Von: lan...@li... [mailto:lan...@li...] Im Auftrag = von Florian Breit Gesendet: Sonntag, 9. Oktober 2005 18:24 An: lan...@li... Betreff: [LSys-Discuss] Classification and further definition of the = project it's goals Hi folks! Since we wanted to start developing the rule sets/guidelines for our=20 I18N Project, it is required to find borders to other projects,=20 especially in the area of L10N. The main focus of our work should be T9N, or rather M17N, but of course we have to take all the other aspects of internationalization into consideration to gain an optimal cooperation with other projects like Pango. Therefore I suggest to collect proposals of features for the next time. If there is already a solution for a problem that works cross-platform wide, we still can factor or cross it out. Here are my feature proposals: o LanguageCodes as recommended by RFC 3066 (eg. en-US) o The possibility to transform POSIX Locales (eg. en_US) into RFC 3066-Styled LanguageCodes. o An illustration of relationships between different languages, especially to find alternative languages for the case that a "language of choice" does not exist. o Loading and using language-files. Therefore XML would be a good=20 choice, in my opinion LDML isn't the best practice here because of its=20 size (at least in its original version as available at unicode.org).=20 But maybe we can support LDML too (maybe this can be added later, when=20 there is a need for it). o Handling of language specific data via unique IDs (direct identification, like INI files) _and_ string combination (like GNU Gettext, but more alike substitution of language data, not just identification by itself). o Generic number formatters, also defined by an ID, that can be used in=20 a lot of different cases. For instance, to say: The number format for "currency" is "%1.%2,%3" where %1(.) is the thousands separator, %2(,) is the decimal separator and %3 the rest. This should been implemented in a way that adds facility for lots of forms of number transformations, like simple numbers separated by thousands or currencys or anything else. o Arguments for language data, for example "Hallo %1!" (de-DE) becomes "Hello %1!" (en), where %1 is a defined argument, for example a number or a string, so that %1 will be replaced by it. For that it should be possible to cover different cases ("0 numbers", "1 number", "2 numbers", ...) via patterns (spoken example: %1 <=3D 7: use "foo", %1 < 100: use "bar", otherwise: use "faz"). o All data should be managed in UTF-8, so there is no trouble with all the different character sets, whereat there should be a possibility to register something like "filters" to get other character sets then UTF-8 for _output_. The input (eg. from the language files) should be always UTF-8. (Later it can be used for example via an ANSI filter for a terminal or integrated and rendered via Pango) So, are there still any proposals for the basis of features we want to provide for the startup? Anything that we should not do, or that we should do any other way? Regards, Florian Breit P.S.: Excuse me that I'm late, but I had to do so much things the last=20 days and had so little time. ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, = discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ languagesys-discussion mailing list lan...@li... https://lists.sourceforge.net/lists/listinfo/languagesys-discussion |
From: Florian B. <fl...@ph...> - 2005-10-09 16:19:32
|
Hi folks! Since we wanted to start developing the rule sets/guidelines for our I18N Project, it is required to find borders to other projects, especially in the area of L10N. The main focus of our work should be T9N, or rather M17N, but of course we have to take all the other aspects of internationalization into consideration to gain an optimal cooperation with other projects like Pango. Therefore I suggest to collect proposals of features for the next time. If there is already a solution for a problem that works cross-platform wide, we still can factor or cross it out. Here are my feature proposals: o LanguageCodes as recommended by RFC 3066 (eg. en-US) o The possibility to transform POSIX Locales (eg. en_US) into RFC 3066-Styled LanguageCodes. o An illustration of relationships between different languages, especially to find alternative languages for the case that a "language of choice" does not exist. o Loading and using language-files. Therefore XML would be a good choice, in my opinion LDML isn't the best practice here because of its size (at least in its original version as available at unicode.org). But maybe we can support LDML too (maybe this can be added later, when there is a need for it). o Handling of language specific data via unique IDs (direct identification, like INI files) _and_ string combination (like GNU Gettext, but more alike substitution of language data, not just identification by itself). o Generic number formatters, also defined by an ID, that can be used in a lot of different cases. For instance, to say: The number format for "currency" is "%1.%2,%3" where %1(.) is the thousands separator, %2(,) is the decimal separator and %3 the rest. This should been implemented in a way that adds facility for lots of forms of number transformations, like simple numbers separated by thousands or currencys or anything else. o Arguments for language data, for example "Hallo %1!" (de-DE) becomes "Hello %1!" (en), where %1 is a defined argument, for example a number or a string, so that %1 will be replaced by it. For that it should be possible to cover different cases ("0 numbers", "1 number", "2 numbers", ...) via patterns (spoken example: %1 <= 7: use "foo", %1 < 100: use "bar", otherwise: use "faz"). o All data should be managed in UTF-8, so there is no trouble with all the different character sets, whereat there should be a possibility to register something like "filters" to get other character sets then UTF-8 for _output_. The input (eg. from the language files) should be always UTF-8. (Later it can be used for example via an ANSI filter for a terminal or integrated and rendered via Pango) So, are there still any proposals for the basis of features we want to provide for the startup? Anything that we should not do, or that we should do any other way? Regards, Florian Breit P.S.: Excuse me that I'm late, but I had to do so much things the last days and had so little time. |