I'm currently working on a speech recognition system using neural networks and I wanted to use the quadri-gram model from the LIUM for lattice rescoring. The problem is, I have a DMP model and I need an ARPA one in my case.
I wanted to use sphinx_lm_convert to do the conversion but it gives me n-gram errors, probably because ngram_model doesn't support 3+ order?
Thus, what do you recommend to do the conversion? I looked in IRSTLM and SRILM toolkits but couldn't find anything.
Thank you in advance. Best Regards,
Florian B.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Between, LIUM model is not very good, it is very targetted to broadcast data. I recommend to build your own from subtitles/wikipedia/books/other crawls.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I already looked in lm4g2dmp but it seems it only does ARPA to DMP conversion. I didn't see any reverse option, am I wrong? The problem is, the only 4-gram model available at LIUM is already a DMP file..
Regarding using LIUM model, it's best suited for my project because I use broadcast data (ESTER corpus 2)!
Best regards,
Florian B.
Last edit: Florian B. 2016-03-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll send him a mail, thanks!
Alternatively, I could build the 4-gram ARPA model from the data using SRILM toolkit but I think it's going to take much longer than building a DMP-ARPA converter. But I may be wrong, I'll ask Yannick.
Thank you for your answers again!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm currently working on a speech recognition system using neural networks and I wanted to use the quadri-gram model from the LIUM for lattice rescoring. The problem is, I have a DMP model and I need an ARPA one in my case.
I wanted to use sphinx_lm_convert to do the conversion but it gives me n-gram errors, probably because ngram_model doesn't support 3+ order?
Thus, what do you recommend to do the conversion? I looked in IRSTLM and SRILM toolkits but couldn't find anything.
Thank you in advance. Best Regards,
Florian B.
LM4G2DMP tool they used for conversion is available here:
http://www-lium.univ-lemans.fr/en/content/lm4g2dmp
Between, LIUM model is not very good, it is very targetted to broadcast data. I recommend to build your own from subtitles/wikipedia/books/other crawls.
Thank you for your quick answer Nickolay!
I already looked in lm4g2dmp but it seems it only does ARPA to DMP conversion. I didn't see any reverse option, am I wrong? The problem is, the only 4-gram model available at LIUM is already a DMP file..
Regarding using LIUM model, it's best suited for my project because I use broadcast data (ESTER corpus 2)!
Best regards,
Florian B.
Last edit: Florian B. 2016-03-29
Then you have to reverse-engineer their code.
You can also drop a mail to Yannick, I think he could help.
I'll send him a mail, thanks!
Alternatively, I could build the 4-gram ARPA model from the data using SRILM toolkit but I think it's going to take much longer than building a DMP-ARPA converter. But I may be wrong, I'll ask Yannick.
Thank you for your answers again!