From: Gabriel M. <gab...@gm...> - 2005-10-21 00:01:21
|
I built an Infomap model using a very large corpus of newspaper articles (100+ million words). I can use associate to query words, but I find that some words that were contained in the corpus and were NOT stopwords are for some reason not contained in the model, i.e. I get a response of "no word vector for X." Is there some frequency threshold set? For example, "falklands" doesn't appear in the model even though it appeared more than 500 times in the corpus. If there is some threshold, can I turn it off? Thanks, Gabriel Murray |