I had a Language Model (LM) from that I want to rebuild the corpus file which is used to built the LM. Is there is any tools or way to do this reverse engineering ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That's impossible since language model contains only info on "how often certain word goes after certain word". Though you can use langauge model as probabilistic automata to generate sentences. You can try that with
I had a Language Model (LM) from that I want to rebuild the corpus file which is used to built the LM. Is there is any tools or way to do this reverse engineering ?
That's impossible since language model contains only info on "how often certain word goes after certain word". Though you can use langauge model as probabilistic automata to generate sentences. You can try that with
from SRILM. Check this man: http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html