Thread: [infomap-nlp-users] word to word matrix

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I want to build word to word matrix, which will contains similarities between words. The way of running associate command for all non-
stopwords in 'dir' file is very slow.

So I tried to read vectors from 'wordvec.bin' file and compute cosine similarity between them. I'm not getting the same results as the 
associate command.
I want to ask, if I've correctly understanded the source codes; that the associate command for word associating is only reading vectors 
from 'wordvec.bin' file and returns sorted top words with greatest cosine similarity to the query word vector? And if the query word is non-
stopword from 'dic' file, than the vector for this word is the same as corresponding vector stored in 'wordvec.bin' file?
I assume that the words in the 'wordvec.bin' are normalized vectors form 'left' file, where vectors are successively associated to words in 
'dic' file. Is this right?

Best regards,
Vladimir Repisky