From: <vl...@sp...> - 2005-05-09 11:34:43
|
Hi Beate, thank you very much for your advice, everything works fine now. I was confused because of the count of vectors in 'left' file was the same as count of words in 'dic' file, 'left' file constains zero vectors on the end to match the count of words in 'dic' file. And 'wordvec.bin' contains only non-stopwords vectors. I find it unnecessary running SVD with the dimension of all words, it can be done with dimension of non-stopwords count. Best regards Vladimir Repisky On Mon, 09 May 2005 11:18:30 +0200, Beate Dorow <do...@im...> wrote: > > Hi Vladimir, > >> I want to ask, if I've correctly understanded the source codes; that the >> associate command for word associating is only reading vectors >> from 'wordvec.bin' file and returns sorted top words with greatest >> cosine >> similarity to the query word vector? And if the query word is non- >> stopword from 'dic' file, than the vector for this word is the same as >> corresponding vector stored in 'wordvec.bin' file? > > Yes, that's exactly how it works. > >> I assume that the words in the 'wordvec.bin' are normalized vectors form >> 'left' file, where vectors are successively associated to words in >> 'dic' file. Is this right? > > This is right except that the dictionary contains also the stopwords. So > the vectors from the "left" file are succesively associated with the > non-stopwords (value of the third column is 0) in "dic". > > To check where things go wrong, you could retrieve the vectors of word1 > and word2 using "associate -q ..." and check whether the vectors > retrieved are consistent with the vectors you obtain. > > Further, you can compute the cosine similarity of word1 and word2 using > "compare_words.pl <options> word1 word2" and see if you get the same > result using your own technique. > > Good luck! > Best wishes, > Beate |