#188 GDLDumper - ExtendedUnitState aliasing

sphinx4 (76)
Vassil Panayotov


I believe there is a problem in linguist.util.LinguistDumper.dumpSearchGraph() when trying to dump a graph containing linguist.flat.ExtendedUnitState. The code assumes that the graph node signature is globally unique, but the signature produced by ExtendedUnitState is only GState-unique.
This can be demonstrated by this simple grammar:

grammar hello;

public <greet> = ( Phil | Phillis | Philip );

The result can be seen in before.svg in the attached file.

One solution could be to use a different ExtendedUnitState.getFullName() in order to produce unique signature, without sacrificing the caching (see ExtendedUnitState.java.patch in the attachment). The price in my solution is the introduction of a dependency on edu.cmu.sphinx.linguist.dictionary.Pronunciation.

In the file attached is also a demo and a patch to demo.xml to build this demo.

Best regards,


  • Demo + patch

  • Hi Vasil

    Sorry for slow response. I checked this and new think that proposed expansion of states is rather dangerious. Should it join states in the beginning of Phil/Philp/Philis? Of course loosing the ends is bad, but probably different signature is needed.

  • Hi Nickolay,

    Thank you for your reply!

    I must admit I can't see the danger that you are talking about...
    It is only to be expected since I am new to both speech recognition and Sphinx 4.
    Can you please elaborate a little more about the situations in which the old signature works, but the new one not ?

  • Hi Vassil

    They both work, the issue is speed here. With the greater number of states recognizer has to calculate more scores, thus it will take longer to recognize utterance than before.

    I haven't decided how to deal with this issue yet though.