From: Beate D. <do...@im...> - 2006-01-31 14:56:19
|
Hi there, To retrieve the documents most similar to a given document, you can simply use "associate -d -i d -m <model_path> -c <corpus> <doc_id>". If you are interested in the pairwise similarities of a set of documents, for now, you can use "associate -q -i d" to retrieve the vector representations of the documents you want to compare. You can then compute the pairwise scalar products of the resulting document vectors to obtain doc-doc similarities. I am attaching a short perl script which computes the pairwise document similarities given a list of document ids (these are the numbers enclosed in the <f> and </f> tags in the wordlist file). I don't know how well the program performs if the number of documents to be compared is large. Good luck! Beate On Tue, 24 Jan 2006, P. Kumsaikaew wrote: > Dear all > > Now, I am try to use your program to identify document similarity. I have done > all installation process. Also, I am able to find the similarity for word like > associate -d -c testSF1 suit (get docid of suit word) > My data file contains three documents. Is any way i can put the entire > document to compare instead of word. For example, i want to get the similarity > value between Doc1, Doc2 and Doc 3. > > Thank you > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > _______________________________________________ > infomap-nlp-users mailing list > inf...@li... > https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users > |