How to rebuild the corpus from a language model (Reverse Engineering)

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

How to rebuild the corpus from a language model (Reverse Engineering)

Forum: Speech Recognition Theory

Created: 2014-08-12

Updated: 2014-08-12

Hemu - 2014-08-12

I had a Language Model (LM) from that I want to rebuild the corpus file which is used to built the LM. Is there is any tools or way to do this reverse engineering ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- bic-user - 2014-08-12
  
  That's impossible since language model contains only info on "how often certain word goes after certain word". Though you can use langauge model as probabilistic automata to generate sentences. You can try that with
  
  ngram -gen
  
  from SRILM. Check this man: http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.