From: Beate D. <do...@im...> - 2005-05-09 09:19:11
|
Hi Vladimir, > I want to ask, if I've correctly understanded the source codes; that the > associate command for word associating is only reading vectors > from 'wordvec.bin' file and returns sorted top words with greatest cosine > similarity to the query word vector? And if the query word is non- > stopword from 'dic' file, than the vector for this word is the same as > corresponding vector stored in 'wordvec.bin' file? Yes, that's exactly how it works. > I assume that the words in the 'wordvec.bin' are normalized vectors form > 'left' file, where vectors are successively associated to words in > 'dic' file. Is this right? This is right except that the dictionary contains also the stopwords. So the vectors from the "left" file are succesively associated with the non-stopwords (value of the third column is 0) in "dic". To check where things go wrong, you could retrieve the vectors of word1 and word2 using "associate -q ..." and check whether the vectors retrieved are consistent with the vectors you obtain. Further, you can compute the cosine similarity of word1 and word2 using "compare_words.pl <options> word1 word2" and see if you get the same result using your own technique. Good luck! Best wishes, Beate |