From: Guenter M. <mi...@us...> - 2010-09-24 13:50:21
|
Dear Docutils developers, currently, if a language is not supported by Docutils with a matching module, conversion fails with ImportError. I propose to change this to a warning, because in cases where no auto-generated text is used the output would still be valid (with the language tag given to the HTML output or used for hypenation etc. in LaTeX) Also, I propose a :lang: role: * in LaTeX and HTML it is possible to mark regions of the document as belonging to a different language. In HTML, this could be used to select a different font or a different voice in a screenreader. In LaTeX, the language setting also affects the hyphenation algorithm. Günter |
From: Alan G I. <ai...@am...> - 2010-09-24 14:19:19
|
On 9/24/2010 9:46 AM, Guenter Milde wrote: > I propose a :lang: role: Might a role for each desired language be preferable? Otherwise this is no more than an almost meaningless "span", unless additional info is added elsewhere. E.g., what I have am writing in English comparing Spanish and French words? fwiw, Alan |
From: David G. <go...@py...> - 2010-09-24 14:31:08
|
On Fri, Sep 24, 2010 at 19:16, Guenter Milde <mi...@us...> wrote: > currently, if a language is not supported by Docutils with a matching > module, conversion fails with ImportError. > > I propose to change this to a warning, because in cases where no > auto-generated text is used the output would still be valid (with the > language tag given to the HTML output or used for hypenation etc. in > LaTeX) No objection. What would the default behavior be, if there is auto-generated text? English? > Also, I propose a :lang: role: > > * in LaTeX and HTML it is possible to mark regions of the document as > belonging to a different language. > > In HTML, this could be used to select a different font or a different > voice in a screenreader. > > In LaTeX, the language setting also affects the hyphenation algorithm. Smells like a solution looking for a problem. If this is a real issue, please provide examples. Alan's multiple-language concern is also valid. How would your proposal be different/better than declaring roles in the document? E.g. .. role:: spanish or .. role:: lang-spanish or .. role:: lang-es -- David Goodger <http://python.net/~goodger> |
From: Guenter M. <mi...@us...> - 2010-09-25 22:02:00
|
On 2010-09-24, David Goodger wrote: > On Fri, Sep 24, 2010 at 19:16, Guenter Milde <mi...@us...> wrote: >> currently, if a language is not supported by Docutils with a matching >> module, conversion fails with ImportError. >> I propose to change this to a warning, because in cases where no >> auto-generated text is used the output would still be valid (with the >> language tag given to the HTML output or used for hypenation etc. in >> LaTeX) > No objection. > What would the default behavior be, if there is auto-generated text? English? I think English is a reasonable default. >> Also, I propose a :lang: role: >> * in LaTeX and HTML it is possible to mark regions of the document as >> belonging to a different language. >> In HTML, this could be used to select a different font or a different >> voice in a screenreader. >> In LaTeX, the language setting also affects the hyphenation algorithm. > Smells like a solution looking for a problem. If this is a real issue, > please provide examples. HTML has a "lang" attribute Language information specified via the lang attribute may be used by a user agent to control rendering in a variety of ways. Some situations where author-supplied language information may be helpful include: * Assisting search engines * Assisting speech synthesizers * Helping a user agent select glyph variants for high quality typography * Helping a user agent choose a set of quotation marks * Helping a user agent make decisions about hyphenation, ligatures, and spacing * Assisting spell checkers and grammar checkers -- http://www.w3.org/TR/html401/struct/dirlang.html Practical examples: Ensure that the German name is hyphenated according to German grammar rules (if word-wrap is active) and pronounced correctly by a screen-reader: To go to the main station you need to change tram at <span lang='de'>Hohenzollernplatz</span>. Ensure that LaTeX uses the right font encoding: We wanted to go to the {\selectlanguage{russian} красная плошадь}. Both should be possible without ressorting to 'raw'. > Alan's multiple-language concern is also valid. If I understand it right, it's one of the reasons for my proposal: I want to be able to mark a word, a quote or a section as beeing in a different language than the rest of the document. I agree that a role might not be the best/only approach to this problem. a) roles do not have arguments b) it would not solve marking e.g. a quote. Alternatives: a) class arguments (lang-<language tag>) e.g.: ``.. class:: lang-de`` would set lang="de" for the following object. b) lang directive e.g.: ``.. lang:: de`` would set lang="de" for the following object. c) :lang: attributes to all relevant rst objects .. line-block:: :lang: de ein deutsches Gedicht > How would your proposal be different/better than declaring roles in > the document? E.g. > .. role:: spanish > or > .. role:: lang-spanish > or > .. role:: lang-es This would set a class attribute on a span that still requires a style setting to be understood by the user agent. How about a base-role "lang" that can be used to create custom language setting roles:: .. role:: es(lang) Günter |
From: David G. <go...@py...> - 2010-09-26 01:56:29
|
On Sun, Sep 26, 2010 at 03:31, Guenter Milde <mi...@us...> wrote: > How about a base-role "lang" that can be used to create custom language > setting roles:: > > .. role:: es(lang) +1 -- David Goodger <http://python.net/~goodger> |
From: Guenter M. <mi...@us...> - 2010-09-29 09:42:21
|
On 2010-09-25, Guenter Milde wrote: > On 2010-09-24, David Goodger wrote: >> On Fri, Sep 24, 2010 at 19:16, Guenter Milde <mi...@us...> wrote: >>> currently, if a language is not supported by Docutils with a matching >>> module, conversion fails with ImportError. >>> I propose to change this to a warning, because in cases where no >>> auto-generated text is used the output would still be valid (with the >>> language tag given to the HTML output or used for hypenation etc. in >>> LaTeX) >> No objection. >> What would the default behavior be, if there is auto-generated text? >> English? > I think English is a reasonable default. Implemented. Alternatively, we could throw an error with a helpfull message in these occasions, e.g. Default "contents" title for language %s missing, please specify an explicit title. or "attention" title for language %s missing, please use a generic admonition with explicit title. In my opinion, this fits better in the general Docutils philosophy of "never fail silently". What do you think? *********************************************************************** >>> * in LaTeX and HTML it is possible to mark regions of the document as >>> belonging to a different language. > HTML has a "lang" attribute LaTeX has a \selectlanguage macro (directive). My first considerations started from the LaTeX perspective (a directive to switch the language). However, from the Docutils perspective, the HTML model (language as a feature of the objects rather than an "language switching" object) seems the better approach. I see two possible implementations to represent language info in the document tree: a) a new `common attribute` "lang" (or "language") b) use the `classes` attribute with a "specific class value" Alternative b) would be less invasive, taking advantage of `classes` element features: The purpose of the attribute is to indicate an "is-a" variant relationship, to allow an extensible way of defining sub-classes of existing elements. It can be used to carry context forward between a Docutils Reader and Writer, ... The classes attribute's contents should be ignorable. Writers that are not familiar with the variant expressed should be able to ignore the attribute. -- docutils/docs/ref/doctree.html#classes Specification of e.g. the language of a quote becomes straightforward:: .. line-block:: :class: language-el-polyton Πάτερ ἡμῶν ὁ ἐν τοῖς οὐρανοῖς· ἁγιασθήτω τὸ ὄνομά σου· ἐλθέτω ἡ βασιλεία σου· or .. class:: language-de Wer immer strebend sich bemüht, denn können wir erlösen. -- Goether (Faust I) Also, custom roles would automatically "do the right thing":: .. role:: language-de To go to the main station you need to change tram at :language-de:`Hohenzollernplatz`. However, the specification also reads: It should not be used to carry formatting instructions or arbitrary content. -- docutils/docs/ref/doctree.html#classes which to me suggests a separate "language" attribute. Opinions? Günter |
From: David G. <dgo...@gm...> - 2010-09-29 13:33:36
|
On Wed, Sep 29, 2010 at 5:41 AM, Guenter Milde <mi...@us...> wrote: > On 2010-09-25, Guenter Milde wrote: >> On 2010-09-24, David Goodger wrote: >>> On Fri, Sep 24, 2010 at 19:16, Guenter Milde <mi...@us...> >>> wrote: >>>> currently, if a language is not supported by Docutils with a matching >>>> module, conversion fails with ImportError. > >>>> I propose to change this to a warning, because in cases where no >>>> auto-generated text is used the output would still be valid (with the >>>> language tag given to the HTML output or used for hypenation etc. in >>>> LaTeX) > >>> No objection. > >>> What would the default behavior be, if there is auto-generated text? >>> English? > >> I think English is a reasonable default. > > Implemented. > > Alternatively, we could throw an error with a helpfull message in > these occasions, e.g. > > Default "contents" title for language %s missing, please specify an > explicit title. > > or > > "attention" title for language %s missing, please use a generic > admonition with explicit title. > > In my opinion, this fits better in the general Docutils philosophy of > "never fail silently". What do you think? +0, as you like. > *********************************************************************** > >>>> * in LaTeX and HTML it is possible to mark regions of the document as >>>> belonging to a different language. > > >> HTML has a "lang" attribute > > LaTeX has a \selectlanguage macro (directive). > > My first considerations started from the LaTeX perspective (a directive > to switch the language). > > However, from the Docutils perspective, the HTML model (language as a > feature of the objects rather than an "language switching" object) seems > the better approach. > > I see two possible implementations to represent language info in the > document tree: > > a) a new `common attribute` "lang" (or "language") That would also require a mechanism to set it, on a phrase-level (inline). I don't think the gain is worth the effort (but you well may). > b) use the `classes` attribute with a "specific class value" > > Alternative b) would be less invasive, taking advantage of > `classes` element features: > > The purpose of the attribute is to indicate an "is-a" variant > relationship, to allow an extensible way of defining sub-classes of > existing elements. It can be used to carry context forward between a > Docutils Reader and Writer, > ... > > The classes attribute's contents should be ignorable. Writers that are > not familiar with the variant expressed should be able to ignore the > attribute. > > -- docutils/docs/ref/doctree.html#classes > > Specification of e.g. the language of a quote becomes straightforward:: > > .. line-block:: > :class: language-el-polyton > > Πάτερ ἡμῶν ὁ ἐν τοῖς οὐρανοῖς· > ἁγιασθήτω τὸ ὄνομά σου· > ἐλθέτω ἡ βασιλεία σου· > > or > > .. class:: language-de > > Wer immer strebend sich bemüht, denn können wir erlösen. > > -- Goether (Faust I) > > > Also, custom roles would automatically "do the right thing":: > > .. role:: language-de > > To go to the main station you need to change tram at > :language-de:`Hohenzollernplatz`. > > > However, the specification also reads: > > It should not be used to carry formatting instructions or arbitrary > content. > > -- docutils/docs/ref/doctree.html#classes > > which to me suggests a separate "language" attribute. I don't follow the reasoning. There are no formatting instructions or arbitrary content. The class describes the content: the language of this text is German. That's a perfectly reasonable "is-a" relation. -- David Goodger <http://python.net/~goodger> |
From: Guenter M. <mi...@us...> - 2010-09-30 07:05:07
|
On 2010-09-29, David Goodger wrote: > On Wed, Sep 29, 2010 at 5:41 AM, Guenter Milde <mi...@us...> wrote: >> On 2010-09-25, Guenter Milde wrote: ... >>>> What would the default behavior be, if there is auto-generated text? >>>> English? ... >> Alternatively, we could throw an error with a helpfull message ... > +0, as you like. Added to the TODO list. >> *********************************************************************** ... >> I see two possible implementations to represent language info in the >> document tree: >> a) a new `common attribute` "lang" (or "language") > That would also require a mechanism to set it, on a phrase-level > (inline). I don't think the gain is worth the effort (but you well > may). >> b) use the `classes` attribute with a "specific class value" ... >> Also, custom roles would automatically "do the right thing":: ... >> However, the specification also reads: >> It should not be used to carry formatting instructions or arbitrary >> content. >> -- docutils/docs/ref/doctree.html#classes >> which to me suggests a separate "language" attribute. > I don't follow the reasoning. There are no formatting instructions or > arbitrary content. The reasoning is: "Content that belongs to the 'lang' attribute might be considered wrong-placed in the 'class' attribute, better ask before coding." > The class describes the content: the language of > this text is German. That's a perfectly reasonable "is-a" relation. I am glad to read this, as this way "the gain is worth the effort": * code in the HTML writer to recognize class arguments with pattern art.startswith('language-') and place them in a 'lang' attribute. * code i the LaTeX writer to insert the right \selectlanguage commands to switch to the specified language (easy) and back (a bit more complex). * documentation, that class attribute values of the form 'language-' + <BCP 47 language tag> are recognized for language identification by some writers. Günter |