Re: [infomap-nlp-users] Help needed in associate command

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi there,

To retrieve the documents most similar to a given document, you can simply 
use "associate -d -i d -m <model_path> -c <corpus> <doc_id>".

If you are interested in the pairwise similarities of a set of 
documents, for now, you can use "associate -q -i d" to retrieve the vector 
representations of the documents you want to compare. You can then compute 
the pairwise scalar products of the resulting document vectors to obtain 
doc-doc similarities.

I am attaching a short perl script which computes the pairwise document 
similarities given a list of document ids (these are the numbers enclosed 
in the <f> and </f> tags in the wordlist file). I don't know how well the 
program performs if the number of documents to be compared is large.

Good luck!
Beate

On Tue, 24 Jan 2006, P. Kumsaikaew wrote:

> Dear all
>
> Now, I am try to use your program to identify document similarity. I have done 
> all installation process. Also, I am able to find the similarity for word like 
> associate -d -c  testSF1 suit (get docid of suit word)
> My data file contains three documents. Is any way i can put the entire 
> document to compare instead of word. For example, i want to get the similarity 
> value between Doc1, Doc2 and Doc 3.
>
> Thank you
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
> _______________________________________________
> infomap-nlp-users mailing list
> inf...@li...
> https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users
>