Menu

g2p.py script. ValueError: symbol out of range: 65536

Help
Mikki
2015-10-13
2015-10-13
  • Mikki

    Mikki - 2015-10-13

    Hello \o/

    System: w7, 64-bit, cygwin, python 2.7

    Train data (example):

    статьей statjej
    широкополая shayraykapolayja
    нетрудно nitrudna
    кадровик kaydravik
    боле bayle
    валяюсь valajurs

    Fix, described http://www.voxforge.org/home/forums/message-boards/audio-discussions/sequitur-g2p--symbol-out-of-range-error/4, was made.

    If "id token" (not sure about term) greater than 65536, then:

    Traceback (most recent call last):
    File "/cygdrive/d/Documents/scientific_literature/linguistics/Transliteration/g2p/build/lib.cygwin-1.7.32-x86_64-2.7/sequitur.py", line 667, in run
    shouldStop = self.iterate(context)
    File "/cygdrive/d/Documents/scientific_literature/linguistics/Transliteration/g2p/build/lib.cygwin-1.7.32-x86_64-2.7/sequitur.py", line 579, in iterate
    self.shallUseMaximumApproximation)
    File "/cygdrive/d/Documents/scientific_literature/linguistics/Transliteration/g2p/build/lib.cygwin-1.7.32-x86_64-2.7/sequitur.py", line 264, in evidence
    for eg in self.graphs(model):
    File "/cygdrive/d/Documents/scientific_literature/linguistics/Transliteration/g2p/build/lib.cygwin-1.7.32-x86_64-2.7/sequitur.py", line 204, in makeGraphs
    eg = self.builder.create(left, right)
    File "/cygdrive/d/Documents/scientific_literature/linguistics/Transliteration/g2p/build/lib.cygwin-1.7.32-x86_64-2.7/sequitur_.py", line 150, in create
    return sequitur.EstimationGraphBuilder_create(self, *args)
    ValueError: symbol out of range: 65536
    iteration failed.
    failed to estimate or load model

    pls help me fixd it.

     
    • Nickolay V. Shmyrev

      In sequitur the input format must be

       word  <symbol1>  <symbol2>  <symbol3>
      

      And the number of output symbols must be less than 65k.

      That means you need to split the second word on individual letters for training. You can merge them after conversion.

      You can also try phonetisaurus, it is more straightforward to use.

       
  • Mikki

    Mikki - 2015-10-13

    It works. Thanks.

     

Log in to post a comment.