#141 (trivial)need regular way to replace MakeIdentifier (especially pyxbgen)

PyXB 1.1.4
closed
None
fixed
Generation Model
trivial
PyXB 1.1.3
enhancement
2012-06-15
2012-06-11
hhsprings
No

I have an alternative makeidentifier, for my personal use (and just for Japanese).
For us, `non-ascii's are NOT meaningless, so it does us no good to subscribe those to be emptystring_xxx.
My alternative makeidentifier is(, but don't care its details):

# alter_make_identifier
import re

_AllAsciiMatch_re = re.compile(r'^[ -~]+$')
_UnderscoreSubstitute_re = re.compile(r'[- .]')
_NonIdentifier_re = re.compile(r'[^a-zA-Z0-9_]')
_PrefixUnderscore_re = re.compile(r'^_+')
_PrefixDigit_re = re.compile(r'^\d+')
_CamelCase_re = re.compile(r'_\w')

import MeCab, romkan
_tagger1 = MeCab.Tagger("-Owakati")
_tagger2 = MeCab.Tagger("-Oyomi")

def MakeIdentifier (s, camel_case=False):
    s = unicode(s)
    if _AllAsciiMatch_re.match(s): # all ascii                                                                                                                                            
        s = _PrefixUnderscore_re.sub('', _NonIdentifier_re.sub('',_UnderscoreSubstitute_re.sub('_', s)))
    else:
        result = _tagger2.parse(_tagger1.parse(s.encode('utf-8')))
        s = "_".join(map(romkan.to_roma, result.decode('utf-8').split(" ")))
        s = _PrefixUnderscore_re.sub('', _NonIdentifier_re.sub('',_UnderscoreSubstitute_re.sub('_', s)))
    if camel_case:
        s = _CamelCase_re.sub(lambda _m: _m.group(0)[1].upper(), s)
    if _PrefixDigit_re.match(s):
        s = 'n' + s
    if 0 == len(s):
        s = 'emptyString'
    return s

(This makeidentifier convert identifiers which are made up of kanji charactors into ascii.)

I can use it by copying pyxbgen locally, and edit like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#!/usr/bin/env python

import pyxb.utils.utility
import alter_make_identifier
pyxb.utils.utility.MakeIdentifier = alter_make_identifier.MakeIdentifier
import pyxb.xmlschema
import pyxb.binding.generate
import pyxb.utils.domutils
import os.path
import sys
import optparse

#...following are abosolutely same as original pyxbgen

I want regular way to replace makeidentifier without copying pyxbgen.
(but I have no idea.)

Discussion

  • Peter A. Bigot

    Peter A. Bigot - 2012-06-11
    • status changed from new to accepted

    OK, that shouldn't be too difficult. I'll try to get it into 1.1.4 sometime in the next couple weeks.

     
  • Peter A. Bigot

    Peter A. Bigot - 2012-06-14
    • status changed from accepted to closed
    • resolution set to fixed

    commit 6c81ed77dd03c0f528708db070d84cc8237a3b43
    Author: Peter A. Bigot <pabigot@…>
    Date: Thu Jun 14 13:13:48 2012 -0500

    trac/141: need regular way to replace MakeIdentifier

    Provide a function to pre-process what's given to MakeIdentifier before it
    strips out invalid characters and otherwise transforms the string.

    The interface to this isn't ideal, and in particular there's no way to
    override the standard behavior using an unmodified pyxbgen. The solution
    will be to use a modified pyxbgen specific to the environment; a crude
    example is present in the test case, and a more refined one will be in the
    unicode_jp example yet to be included.

    commit 5c4f4c5c3d80209540c612e82c6d7e762b3d7b70
    Author: Peter A. Bigot <pabigot@…>
    Date: Thu Jun 14 14:24:46 2012 -0500

    Add demo of PyXB processing Japanese OpenGIS documents

    Way cool.

    :000000 100644 0000000... 463e27c... A examples/unicode_jp/README.txt
    :000000 100644 0000000... f434a60... A examples/unicode_jp/check.py
    :000000 100644 0000000... 20c5d51... A examples/unicode_jp/data/euc-jp/FG-GML-13-RailCL25000-20080331-0001.xml
    :000000 100644 0000000... 5b03078... A examples/unicode_jp/data/euc-jp/FGD_GMLSchema.xsd
    :000000 100644 0000000... ba2cb6d... A examples/unicode_jp/data/iso-2022-jp/FG-GML-13-RailCL25000-20080331-0001.xml
    :000000 100644 0000000... 17cf8be... A examples/unicode_jp/data/iso-2022-jp/FGD_GMLSchema.xsd
    :000000 100644 0000000... aeca7bf... A examples/unicode_jp/data/readme.txt
    :000000 100644 0000000... f23a71a... A examples/unicode_jp/data/shift_jis/FG-GML-13-RailCL25000-20080331-0001.xml
    :000000 100644 0000000... ff0d3d6... A examples/unicode_jp/data/shift_jis/FGD_GMLSchema-ss.jpg
    :000000 100644 0000000... fa6461a... A examples/unicode_jp/data/shift_jis/FGD_GMLSchema.xsd
    :000000 100644 0000000... 714eb82... A examples/unicode_jp/data/shift_jis/readme.txt
    :000000 100644 0000000... 3da0866... A examples/unicode_jp/data/utf-8/FG-GML-13-RailCL25000-20080331-0001.xml
    :000000 100644 0000000... db8a20a... A examples/unicode_jp/data/utf-8/FGD_GMLSchema.xsd
    :000000 100755 0000000... fbd00e0... A examples/unicode_jp/pyxbgen_jp
    :000000 100755 0000000... f64001b... A examples/unicode_jp/test.sh

     
  • hhsprings

    hhsprings - 2012-06-15

    I surprised that you investigated also MeCab, romkan.
    All test was passed, and no problem.

    Thank you, great job!

     
  • Peter A. Bigot

    Peter A. Bigot - 2012-06-15

    Replying to hhsprings:

    I surprised that you investigated also MeCab, romkan.

    Your transliteration code was very interesting and it only took 15 minutes to install the packages and find romkan.py. I think it makes a much more impressive demo. It also makes clear that Python3 support and Unicode identifiers is very important to have a usable system.

    All test was passed, and no problem.

    Thank you, great job!

    Dou itashimashite. This was a great test case that brought back memories (I took a semester of Japanese after grad school fifteen years ago, but never used it and have almost forgotten everything).

    By the way, I currently can credit you only as "hhsprings" in the example README; if you would like credit under your real name, please send me email through sourceforge. I would not include your email address, just your name, and that only if you wish.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks