From: <vl...@sp...> - 2005-05-09 00:14:54
|
Hello, I want to build word to word matrix, which will contains similarities between words. The way of running associate command for all non- stopwords in 'dir' file is very slow. So I tried to read vectors from 'wordvec.bin' file and compute cosine similarity between them. I'm not getting the same results as the associate command. I want to ask, if I've correctly understanded the source codes; that the associate command for word associating is only reading vectors from 'wordvec.bin' file and returns sorted top words with greatest cosine similarity to the query word vector? And if the query word is non- stopword from 'dic' file, than the vector for this word is the same as corresponding vector stored in 'wordvec.bin' file? I assume that the words in the 'wordvec.bin' are normalized vectors form 'left' file, where vectors are successively associated to words in 'dic' file. Is this right? Best regards, Vladimir Repisky |