Re: [Docutils-users] Is rst2odt -l cs supposed to work?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear Dave,

On 2017-05-16, Dave Kuhlman wrote:

> I did the commit.

> And, before I did so, I took Matěj's suggestion about using
> ``locale.normalize(lang), so now you can use::

>     $ rst2odt.py -l cs somedoc.txt somedoc.odt
>     $ rst2odt.py -l es somedoc.txt somedoc.odt

> to get Czech and Spanish (Spain).

Fine.

> And, of course, you can override the region, for example::

>     $ rst2odt.py -l cs-GB  somedoc.txt somedoc.odt
>     $ rst2odt.py -l es-mx somedoc.txt somedoc.odt

> to get British English and Mexican Spanish.


I suggest the patch below to allow for BCP 47 tags like

  de-Latf-AT        # second tag is script, region in 3rd position
  de-latf           # second tag is script, no region given
  de-1901           # second tag is variant (here: spelling), no region given
  

Further changes:

The RuntimeError if locale.normalize fails to find a region tag is
replaced with a Warning: a missing region tag does not prevent export
of a functional output document.

The RuntimeError for empty "self.visitor.language_code" is removed on the
assumption that if a user calls ``--language=""``, this indicates that no
language should be written into the output --- which is exactly what happens
in this case.

>From the function "languages.normalize_language_tag()", we only need the
replacement of "_" by "-". This is better done with a string method.


Günter


Dir: /home/milde/Code/Python/docutils-svn/docutils/docutils/writers/odf_odt/

Index: __init__.py
===================================================================

--- __init__.py	(Revision 8069)
+++ __init__.py	(Arbeitskopie)
@@ -572,38 +572,35 @@
         s1 = self.get_stylesheet()
         # Set default language in document to be generated.
         # Language is specified by the -l/--language command line option.
-        # Allowed values are "ll", "ll-rr" or "ll_rr", where ll is language
-        # and rr is region.  If region is omitted, we use
+        # The format is described in BCP 47.  If region is omitted, we use
         # local.normalize(ll) to obtain a region.
         language_code = None
         region_code = None
-        if len(self.visitor.normalized_language_code) > 0:
-            language_ids = self.visitor.normalized_language_code[0].split('-')
-            if len(language_ids) == 2:
-                language_code = language_ids[0]
-                region_code = language_ids[1]
-            elif len(language_ids) == 1:
-                language_code = language_ids[0]
+        if self.visitor.language_code:
+            language_ids = self.visitor.language_code.replace('_','-')
+            language_ids = language_ids.split('-')
+            # first tag is primary language tag
+            language_code = language_ids[0].lower()
+            # 2-letter region subtag may follow in 2nd or 3rd position
+            for subtag in language_ids[1:]:
+                if len(subtag) == 2 and subtag.isalpha():
+                    region_code = subtag.upper()
+                    break
+                elif len(subtag) == 1:
+                    break # 1-letter tag is never before valid region tag
+            if region_code is None:
                 rcode = locale.normalize(language_code)
                 rcode = rcode.split('_')
                 if len(rcode) > 1:
-                    rcode = rcode[1]
-                    rcode = rcode.split('.')
-                    if len(rcode) >= 1:
-                        region_code = rcode[0]
+                    rcode = rcode[1].split('.')
+                    region_code = rcode[0]
                 if region_code is None:
-                    raise RuntimeError(
+                    self.document.reporter.warning(
                         'invalid language-region.  '
                         'Could not find region with locale.normalize().  '
                         'If language is supplied, then you must specify '
-                        'both lanauge and region (ll-rr).  Examples: '
-                        'es-mx (Spanish, Mexico), en-au (English, Australia).')
-        else:
-            raise RuntimeError(
-                'invalid language-region. '
-                'Format must be "ll-rr" or "ll_rr", where ll is language '
-                'and rr is region. '
-                'See https://en.wikipedia.org/wiki/IETF_language_tag')
+                        'both language and region (ll-RR).  Examples: '
+                        'es-MX (Spanish, Mexico), en-AU (English, Australia).')
         # Update the style ElementTree with the language and region.
         # Note that we keep a reference to the modified node because
         # it is possible that ElementTree will throw away the Python
@@ -888,8 +885,6 @@
         self.language = languages.get_language(
             self.language_code,
             document.reporter)
-        self.normalized_language_code = languages.normalize_language_tag(
-            self.language_code)
         self.format_map = {}
         if self.settings.odf_config_file:
             from ConfigParser import ConfigParser