I'm using Pocketsphinx inside FreeSwitch and unintentionally ran into a problem with a "pizza ordering" demo from the FreeSwitch wiki.
The "pizza" grammar files and folders that the FreeSwitch wiki pointed to for downloading had a problem. Their "pizza ordering" dialogue occurred up to a point then FreeSwitch crashed. This occurred when the dialogue changed to a folder were it had an "empty" .lm file (i.e. no text in the file). The associated .arpabo file had the content that the .lm file should have had but this didn't make a difference. On the other hand, folders that had text in their .lm files had nothing in their .arpabo files but their seemed to be no problems with these parts of the demos dialogue.
I used LMtool (http://www.speech.cs.cmu.edu/tools/lmtool.html) to generate .lm files from the .corpus files in this demo and it generated the exact same content that was found in either the .lm or .arpabo files.
When I placed the content of the .arpabo files into the associated .lm files to give me a complete set. The demo seemed to work fine.
However, I'm not sure that this was because of the way I'm dialing into or using FreeSwitch. So, I'm wondering what these .arpabo files are and are they needed in PocketSphinx?
Mark.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ARPA is the format for language model files used in sphinx decoders. ARPA language model can have arpabo extension or lm extension, it doesn't matter actually.
About freeswitch grammars, it seems they are ok. Each folder have both lm and arpabo files and they are equal. They were just generated with lmtool. So I wonder if empty file is appeared due to unpacking issues.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So I tried a couple unpackers, IZarc and Winzip 12. They gave me slightly different results. In Winzip 12 the lm files were all missing but in IZarc some lm files were empty. Another difference was that IZarc unpacked files to their respective folders but WinZip 12 didn't but this may have been a settings issue. Currently, they unpack with the right content in lm files but they didn't a few days ago so a member of the FreeSwitch team must have fixed the problem after I let them know.
Anyway, this all got me curious about these file extensions. Why are there two extensions (lm and arpabo) if it doesn't matter? But it seems to matter since only when lm files were corrected that made the "pizza demo" work.
Thanks
Mark
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's just an unimportant issue with online lmtool which sends you both files in archive where arpabo is a link on .lm. Not all windows archivers support that. You can complain to lmtool author to get this issue fixed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It comes up in the section "Building your own grammar files" and it has to do with a Perl script (quick_lm.pl) used to construct the grammar files.
As input, .sent files are used that have delimiters <s> and </s> around each sentence like
<s> THIS IS SENTENCE NUMBER ONE </s>
<s> THIS IS SENTENCE NUMBER TWO </s>
and gives a .sent.arpabo file. But .corpus files don't use these delimiters.
The lmtool asks for .corpus files but will it work for .sent files?
What's the purpose of .sent files if one can use .corpus files?
Thanks.
Mark.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using Pocketsphinx inside FreeSwitch and unintentionally ran into a problem with a "pizza ordering" demo from the FreeSwitch wiki.
The "pizza" grammar files and folders that the FreeSwitch wiki pointed to for downloading had a problem. Their "pizza ordering" dialogue occurred up to a point then FreeSwitch crashed. This occurred when the dialogue changed to a folder were it had an "empty" .lm file (i.e. no text in the file). The associated .arpabo file had the content that the .lm file should have had but this didn't make a difference. On the other hand, folders that had text in their .lm files had nothing in their .arpabo files but their seemed to be no problems with these parts of the demos dialogue.
I used LMtool (http://www.speech.cs.cmu.edu/tools/lmtool.html) to generate .lm files from the .corpus files in this demo and it generated the exact same content that was found in either the .lm or .arpabo files.
When I placed the content of the .arpabo files into the associated .lm files to give me a complete set. The demo seemed to work fine.
However, I'm not sure that this was because of the way I'm dialing into or using FreeSwitch. So, I'm wondering what these .arpabo files are and are they needed in PocketSphinx?
Mark.
ARPA is the format for language model files used in sphinx decoders. ARPA language model can have arpabo extension or lm extension, it doesn't matter actually.
About freeswitch grammars, it seems they are ok. Each folder have both lm and arpabo files and they are equal. They were just generated with lmtool. So I wonder if empty file is appeared due to unpacking issues.
I thought it was unpacking as well.
So I tried a couple unpackers, IZarc and Winzip 12. They gave me slightly different results. In Winzip 12 the lm files were all missing but in IZarc some lm files were empty. Another difference was that IZarc unpacked files to their respective folders but WinZip 12 didn't but this may have been a settings issue. Currently, they unpack with the right content in lm files but they didn't a few days ago so a member of the FreeSwitch team must have fixed the problem after I let them know.
Anyway, this all got me curious about these file extensions. Why are there two extensions (lm and arpabo) if it doesn't matter? But it seems to matter since only when lm files were corrected that made the "pizza demo" work.
Thanks
Mark
It's just an unimportant issue with online lmtool which sends you both files in archive where arpabo is a link on .lm. Not all windows archivers support that. You can complain to lmtool author to get this issue fixed.
I have no problem with arpabo files not being needed.
Another item I would appreciate clarification on is found in FreeSwitch's Mod pocketsphinx wiki.
http://wiki.freeswitch.org/wiki/Mod_pocketsphinx
It comes up in the section "Building your own grammar files" and it has to do with a Perl script (quick_lm.pl) used to construct the grammar files.
As input, .sent files are used that have delimiters <s> and </s> around each sentence like
<s> THIS IS SENTENCE NUMBER ONE </s>
<s> THIS IS SENTENCE NUMBER TWO </s>
and gives a .sent.arpabo file. But .corpus files don't use these delimiters.
The lmtool asks for .corpus files but will it work for .sent files?
What's the purpose of .sent files if one can use .corpus files?
Thanks.
Mark.
Lmtool doesn't insert <s> and </s>. You need to manually insert them with awk for example. Lmtool must work with .sent files.
.corpus is a temporary file that is not used at all, there is not sense to include it into the package or generate with lmtool. Just ignore it.
Got it, sentence corpus files are .sent files.
That helps clear things up.
Thanks.