Hi there,
To retrieve the documents most similar to a given document, you can simply
use "associate -d -i d -m <model_path> -c <corpus> <doc_id>".
If you are interested in the pairwise similarities of a set of
documents, for now, you can use "associate -q -i d" to retrieve the vector
representations of the documents you want to compare. You can then compute
the pairwise scalar products of the resulting document vectors to obtain
doc-doc similarities.
I am attaching a short perl script which computes the pairwise document
similarities given a list of document ids (these are the numbers enclosed
in the <f> and </f> tags in the wordlist file). I don't know how well the
program performs if the number of documents to be compared is large.
Good luck!
Beate
On Tue, 24 Jan 2006, P. Kumsaikaew wrote:
> Dear all
>
> Now, I am try to use your program to identify document similarity. I have done
> all installation process. Also, I am able to find the similarity for word like
> associate -d -c testSF1 suit (get docid of suit word)
> My data file contains three documents. Is any way i can put the entire
> document to compare instead of word. For example, i want to get the similarity
> value between Doc1, Doc2 and Doc 3.
>
> Thank you
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems? Stop! Download the new AJAX search engine that makes
> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
> _______________________________________________
> infomap-nlp-users mailing list
> inf...@li...
> https://lists.sourceforge.net/lists/listinfo/infomap-nlp-users
> |