Hi
I did create a dictionary with 2 languages(English/Persian) at the one file like this:
بگو B E G U
خزنده KH A Z A N D E
قدت GH A D E T
چنده CH A N D E
قد GH A D
من M A N
شب SH A B
hi H AA Y
hello H E L L O
how H O V
are AA R
you Y U
what V AA T
is I Z
your Y O R
name N E Y M
old O L D
where V E R
from F E R AA M
Hi dear Nikolay!
I attached my etc folder to my post. I tried to create language model with the online tools the tutorial suggested, but it only works for Persian languages and it can not recognize English words at the same time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I found that the problem is because of my phoneset file. I used my own phoneset for both English and Persian languages as you can see in my question(first post) but it seems language model only accepts Persian words and don't wotk for English words with this phoneset!
It seems I must use the standard English phoneset(I mean what the pocketsphinx use for English) for my Persian words as well to have a bi-langual model....But I don't know where can I find a complete set of them and an instruction to know how can I write Persian words with that phoneset?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the paper Nikolay! But it seems as I said above, I must use the standard phoneset that pocketsphinx use for English language, but don't know where can I find a full list of them? I mean a complete phoneset file of what pocketsphinx use
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For example, when I tried to write the Persian words with English letters, and tried to make a language model, the dictionary is look like this:
A AH
A(2) EY
AB AE B
ABI AE B IY
AGHAB AE AH B
AHLE AE L
AIR EH R
AM AE M
AM(2) EY EH M
ANGUR AE N G AH R
ANJIR AE N JH AH R
APPLE AE P AH L
ARE AA R
ARE(2) ER
AROUND ER AW N D
AROUND(2) ER AW N
BABAT B AE B AH T
BACK B AE K
BALA B AA L AH
BEBAR B IH B AA R
BECHARKH B AH CH AA R K
BECOME B IH K AH M
BEGU B IH G Y UW
BENEVIS B IY AH N IY V IY Z
BIRD B ER D
BIYA B AY Y AA
BLACK B L AE K
BLUE B L UW
BORO B ER OW
BRIGHT B R AY T
BROWN B R AW N
CHAND CH AE N D
CHANDE CH AE N D
CHANDOMI CH AE N D AH M IY
CHAPETO CH AE P AH T OW
CHE CH EY
CHETORE CH EH T AH R
CHIE CH AY
CLASE K L EY Z
COLOR K AH L ER
COLOR(2) K AO L ER
COME K AH M
CREEPY K R IY P IY
DADS D AE D Z
DARI D AE R IY
DARK D AA R K
DARYA D AE R AY AH
DASTE D EY S T
DAY D EY
DO D UW
DORE D AO R
DUS D AH Z
EDUCATED EH JH AH K EY T IH D
EDUCATED(2) EH JH Y UW K EY T IH D
ESME EH Z M
ESMET EH Z M AH T
ESMETO EH Z M AH T OW
FATHERS F AA DH ER Z
FAVORITE F EY V ER IH T
FAVORITE(2) F EY V R AH T
FIGS F IH G Z
FOOD F UW D
FORWARD F AO R W ER D
FROM F R AH M
GHADE G EY D
GHADET G AE D AH T
GHAHVEI G AE V EY
GHAZAEI G AE Z IY IY
GHERMEZ G AH R M AH Z
GO G OW
GOLABI G AA L AH B IY
GRADE G R EY D
GRAPE G R EY P
GRAY G R EY
GREEN G R IY N
HALET HH AE L AH T
HAND HH AE N D
HAVA HH AE V AH
HELLO HH AH L OW
HELLO(2) HH EH L OW
HI HH AY
HOME HH OW M
HOW HH AW
HULU HH Y UW L UW
I AY
IN IH N
IS IH Z
JELO JH EH L OW
KHAK K HH AE K
KHAKESTARI K HH AH K EH S T AH R IY
KHAMUSH K HH AE M AH SH
KHAZANDE K HH AE Z AH N D
KHODET K HH AA D AH T
KHUNATUN K HH Y UW N AH T AH N
KOJAEI K AA JH IY IY
KOJAS K AA JH AH Z
KON K AA N
LEFT L EH F T
LOVE L AH V
MADARET M AE D AH R AH T
MAMANET M AE M AH N AH T
MAN M AE N
MANO M AA N OW
ME M IY
MESHKI M EH SH IY
MOALEMET M OW L AH M AH T
MOMS M AA M Z
MOTHERS M AH DH ER Z
NAME N EY M
NARENJI N AH R EH N JH IY
NIGHT N AY T
NOGHREI N AA R IY
OF AH V
OFF AO F
OLD OW L D
ON AA N
ON(2) AO N
OR AO R
OR(2) ER
ORANGE AO R AH N JH
ORANGE(2) AO R IH N JH
PARANDE P AE R AH N D
PEACH P IY CH
PEAR P EH R
PEDARET P EH D AH R AH T
PINK P IH NG K
PORTEGHAL P AO R T AH AH L
PUT P UH T
RANGIE R EY N JH IY
RASTETO R AE S T AH T OW
RED R EH D
RIGHT R AY T
RO R OW
ROSHAN R AA SH AH N
ROSHANE R AH SH EY N
RUZ R AH Z
SABZ S AE B Z
SAFHATO S AE F AH T OW
SALAM S AA L AA M
SALETE S AH L IY T
SAVAD S AE V AH D
SCREEN S K R IY N
SEA S IY
SEFID S EH F AH D
SHAB SH AE B
SHO SH OW
SIB S IH B
SILVER S IH L V ER
SIYAH S AY AY AH
SOIL S OY L
SURATI S UH R AH T IY
TALL T AO L
TARIK T AE R AH K
TEACHERS T IY CH ER Z
TELL T EH L
THE DH AH
THE(2) DH IY
TURN T ER N
TUSI T Y UW S IY
UP AH P
WATER W AO T ER
WHAT W AH T
WHAT(2) HH W AH T
WHERE W EH R
WHERE(2) HH W EH R
WHITE W AY T
WHITE(2) HH W AY T
WRITE R AY T
YA Y AA
YE Y IY
YE(2) Y EH
YELLOW Y EH L OW
YOU Y UW
YOUR Y AO R
YOUR(2) Y UH R
YOURSELF Y ER S EH L F
YOURSELF(2) Y UH R S EH L F
YOURSELF(3) Y AO R S EH L F
ZARD Z AA R D
But it uses incorrect phonemes for Persian words, because it considers them as English words. So the only way is to correcting them by hand, but I have no reference to the phone lists to know how to correct Persian words phones?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
I did create a dictionary with 2 languages(English/Persian) at the one file like this:
And built language model with this online tool http://www.speech.cs.cmu.edu/tools/lmtool-new.html
But after training my acoustic model with sphinxtrain, the resault only response for persian!
Is it possible to make a recognition system that response for 2 languages at the same time?
You need to provide more information to get help on this problem - data, models, command lines.
Thank you Nikolay, after a long time that I was in other project i came back here to complete this project. I will put other information very soon.
Hi dear Nikolay!
I attached my etc folder to my post. I tried to create language model with the online tools the tutorial suggested, but it only works for Persian languages and it can not recognize English words at the same time.
Also the command I use to test is:
pocketsphinx_continuous -inmic yes -hmm /home/m/myrobot2/robot/model_parameters/robot.cd_cont_200 -lm /home/m/myrobot2/robot/etc/robot.lm -dict /home/m/myrobot2/robot/etc/robot.di
And it works. But only recognizes Persian sentences and words.
This is the etc file(I have problem with editing my posts)
I found that the problem is because of my phoneset file. I used my own phoneset for both English and Persian languages as you can see in my question(first post) but it seems language model only accepts Persian words and don't wotk for English words with this phoneset!
It seems I must use the standard English phoneset(I mean what the pocketsphinx use for English) for my Persian words as well to have a bi-langual model....But I don't know where can I find a complete set of them and an instruction to know how can I write Persian words with that phoneset?
You can read https://aclanthology.info/pdf/N/N04/N04-2001.pdf about similar experience.
Thank you for the paper Nikolay! But it seems as I said above, I must use the standard phoneset that pocketsphinx use for English language, but don't know where can I find a full list of them? I mean a complete phoneset file of what pocketsphinx use
Its kind of sad you do not know how to use google. Its just a second link https://en.wikipedia.org/wiki/ARPABET or http://www.speech.cs.cmu.edu/cgi-bin/cmudict
For example, when I tried to write the Persian words with English letters, and tried to make a language model, the dictionary is look like this:
But it uses incorrect phonemes for Persian words, because it considers them as English words. So the only way is to correcting them by hand, but I have no reference to the phone lists to know how to correct Persian words phones?